# Coding Range

## Swapping variables with XOR

February 10th, 2014

Thanks to @int_SiPlus_void the following tweet made it’s way onto my timeline today:

I’d actually never heard of this before until @dannolan pointed me to this Wikipedia page on the XOR swap algorithm.

The basics are pretty simple. To swap two variables ‘x’ and ‘y’, perform the following operations:

``````x = x XOR y
y = x XOR y
x = x XOR y
``````

To test this out I put together a basic test project in Xcode, and it worked.

The next question then is whether or not this is actually any faster than the traditional swap, which is as follows:

``````temp = x
x = y
y = temp
``````

To test this I wrote a dylib that expose two functions, `traditional_swap` and `xor_swap`. These were built with `-O3` in Xcode 5.0.2 (5A3005) using:

``````Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
Target: x86_64-apple-darwin13.0.2
``````

The two functions are as follows:

``````void traditional_swap(int * x, int * y)
{
int temp = *x;
*x = *y;
*y = temp;
}

void xor_swap(int * x, int * y)
{
*x = *x ^ *y;
*y = *x ^ *y;
*x = *x ^ *y;
};
``````

To test this I wrote a basic benchmark executable that invokes the functions from the dylib:

``````#include <stdio.h>
#include <time.h>

void xor_swap(int * x, int * y);
void traditional_swap(int * x, int * y);

int main(int argc, const char * argv[])
{
void (*swapFn)(int * x, int * y);

// Uncomment a line to use it in the benchmark
//  swapFn = xor_swap;

clock_t start = clock();

int cRuns = 2500000;
for (int i = 0; i < cRuns; i++)
{
int x = 88, y = 72;
swapFn(&x, &y);
}

clock_t end = clock();
fprintf(stderr, "time taken: %lu\n", end - start);
fprintf(stderr, "avg time: %f\n", (end - start) / (double)cRuns);

return 0;
}
``````

I’d never used `clock_t` and `clock()` before, but given the results I assume they return a time representation in milliseconds. (Thank you Stack Overflow.)

The numbers are very similar each time the program is run, except the first time. I don’t know what’s slower about the first time, but I’d assume it’s some OS-level thing I can’t control.

Here’s the output for `xor_swap`:

``````time taken: 7116
avg time: 0.002846
Program ended with exit code: 0
``````

Here’s the output for `traditional_swap`:

``````time taken: 6434
avg time: 0.002574
Program ended with exit code: 0
``````

As it currently stands, `xor_swap` is roughly 10% slower than `traditional_swap`. This is also supported by the assembler that clang generated, adapted from Hopper:

``````_traditional_swap_f70:
push rbp
mov rbp, rsp
mov eax, dword [ds:rdi]
mov ecx, dword [ds:rsi]
mov dword [ds:rdi], ecx
mov dword [ds:rsi], eax
pop rbp
ret

_xor_swap_f80:
push rbp
mov rbp, rsp
mov eax, dword [ds:rsi]
xor eax, dword [ds:rdi]
mov dword [ds:rdi], eax
xor eax, dword [ds:rsi]
mov dword [ds:rsi], eax
xor dword [ds:rdi], eax
pop rbp
ret
``````

Ignoring the basic setup and teardown of each method, `traditional_swap` is fewer instructions and thus I would expect it to execute quicker (though I understand this is not neccesarily the always case).

`traditional_swap` consists of four `mov`s, which I found interesting as I would logically expect 3 `mov`s.

`xor_swap` consists of three sets of `mov xor`, which seems to make sense but I honestly don’t know too much assembler so I’ll have to trust the compiler on this one.

I wonder if either method could be improved with hand-written assembler, but perhaps I’ll leave that for another time.