Swapping variables with XOR
February 10th, 2014
Thanks to @int_SiPlus_void the following tweet made it’s way onto my timeline today:
"Another XOR trick is swapping 2 values without a 3rd variable. This is a great trick if you want to reduce the readability of your code."
— Leonard Ritter (@paniq) February 10, 2014
I’d actually never heard of this before until @dannolan pointed me to this Wikipedia page on the XOR swap algorithm.
The basics are pretty simple. To swap two variables ‘x’ and ‘y’, perform the following operations:
x = x XOR y
y = x XOR y
x = x XOR y
To test this out I put together a basic test project in Xcode, and it worked.
The next question then is whether or not this is actually any faster than the traditional swap, which is as follows:
temp = x
x = y
y = temp
To test this I wrote a dylib that expose two functions, traditional_swap
and xor_swap
. These were built with -O3
in Xcode 5.0.2 (5A3005) using:
Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
Target: x86_64-apple-darwin13.0.2
Thread model: posix
The two functions are as follows:
void traditional_swap(int * x, int * y)
{
int temp = *x;
*x = *y;
*y = temp;
}
void xor_swap(int * x, int * y)
{
*x = *x ^ *y;
*y = *x ^ *y;
*x = *x ^ *y;
};
To test this I wrote a basic benchmark executable that invokes the functions from the dylib:
#include <stdio.h>
#include <time.h>
void xor_swap(int * x, int * y);
void traditional_swap(int * x, int * y);
int main(int argc, const char * argv[])
{
void (*swapFn)(int * x, int * y);
// Uncomment a line to use it in the benchmark
// swapFn = xor_swap;
// swapFn = traditional_swap;
clock_t start = clock();
int cRuns = 2500000;
for (int i = 0; i < cRuns; i++)
{
int x = 88, y = 72;
swapFn(&x, &y);
}
clock_t end = clock();
fprintf(stderr, "time taken: %lu\n", end - start);
fprintf(stderr, "avg time: %f\n", (end - start) / (double)cRuns);
return 0;
}
I’d never used clock_t
and clock()
before, but given the results I assume they return a time representation in milliseconds. (Thank you Stack Overflow.)
The numbers are very similar each time the program is run, except the first time. I don’t know what’s slower about the first time, but I’d assume it’s some OS-level thing I can’t control.
Here’s the output for xor_swap
:
time taken: 7116
avg time: 0.002846
Program ended with exit code: 0
Here’s the output for traditional_swap
:
time taken: 6434
avg time: 0.002574
Program ended with exit code: 0
As it currently stands, xor_swap
is roughly 10% slower than traditional_swap
. This is also supported by the assembler that clang generated, adapted from Hopper:
_traditional_swap_f70:
push rbp
mov rbp, rsp
mov eax, dword [ds:rdi]
mov ecx, dword [ds:rsi]
mov dword [ds:rdi], ecx
mov dword [ds:rsi], eax
pop rbp
ret
_xor_swap_f80:
push rbp
mov rbp, rsp
mov eax, dword [ds:rsi]
xor eax, dword [ds:rdi]
mov dword [ds:rdi], eax
xor eax, dword [ds:rsi]
mov dword [ds:rsi], eax
xor dword [ds:rdi], eax
pop rbp
ret
Ignoring the basic setup and teardown of each method, traditional_swap
is fewer instructions and thus I would expect it to execute quicker (though I understand this is not neccesarily the always case).
traditional_swap
consists of four mov
s, which I found interesting as I would logically expect 3 mov
s.
xor_swap
consists of three sets of mov xor
, which seems to make sense but I honestly don’t know too much assembler so I’ll have to trust the compiler on this one.
I wonder if either method could be improved with hand-written assembler, but perhaps I’ll leave that for another time.