Swapping variables with XOR

February 10th, 2014

Thanks to @int_SiPlus_void the following tweet made it’s way onto my timeline today:

"Another XOR trick is swapping 2 values without a 3rd variable. This is a great trick if you want to reduce the readability of your code."
— Leonard Ritter (@paniq) February 10, 2014

I’d actually never heard of this before until @dannolan pointed me to this Wikipedia page on the XOR swap algorithm.

The basics are pretty simple. To swap two variables ‘x’ and ‘y’, perform the following operations:

x = x XOR y
y = x XOR y
x = x XOR y

To test this out I put together a basic test project in Xcode, and it worked.

The next question then is whether or not this is actually any faster than the traditional swap, which is as follows:

temp = x
x = y
y = temp

To test this I wrote a dylib that expose two functions, traditional_swap and xor_swap. These were built with -O3 in Xcode 5.0.2 (5A3005) using:

Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
Target: x86_64-apple-darwin13.0.2
Thread model: posix

The two functions are as follows:

void traditional_swap(int * x, int * y)
{
    int temp = *x;
    *x = *y;
    *y = temp;
}

void xor_swap(int * x, int * y)
{
    *x = *x ^ *y;
    *y = *x ^ *y;
    *x = *x ^ *y;
};

To test this I wrote a basic benchmark executable that invokes the functions from the dylib:

#include <stdio.h>
#include <time.h>

void xor_swap(int * x, int * y);
void traditional_swap(int * x, int * y);

int main(int argc, const char * argv[])
{
    void (*swapFn)(int * x, int * y);

    // Uncomment a line to use it in the benchmark
//  swapFn = xor_swap;
//  swapFn = traditional_swap;

    clock_t start = clock();

    int cRuns = 2500000;
    for (int i = 0; i < cRuns; i++)
    {
        int x = 88, y = 72;
        swapFn(&x, &y);
    }

    clock_t end = clock();
    fprintf(stderr, "time taken: %lu\n", end - start);
    fprintf(stderr, "avg time: %f\n", (end - start) / (double)cRuns);

    return 0;
}

I’d never used clock_t and clock() before, but given the results I assume they return a time representation in milliseconds. (Thank you Stack Overflow.)

The numbers are very similar each time the program is run, except the first time. I don’t know what’s slower about the first time, but I’d assume it’s some OS-level thing I can’t control.

Here’s the output for xor_swap:

time taken: 7116
avg time: 0.002846
Program ended with exit code: 0

Here’s the output for traditional_swap:

time taken: 6434
avg time: 0.002574
Program ended with exit code: 0

As it currently stands, xor_swap is roughly 10% slower than traditional_swap. This is also supported by the assembler that clang generated, adapted from Hopper:

_traditional_swap_f70:
push rbp
mov rbp, rsp
mov eax, dword [ds:rdi]
mov ecx, dword [ds:rsi]
mov dword [ds:rdi], ecx
mov dword [ds:rsi], eax
pop rbp
ret     

_xor_swap_f80:
push rbp
mov rbp, rsp
mov eax, dword [ds:rsi]
xor eax, dword [ds:rdi]
mov dword [ds:rdi], eax
xor eax, dword [ds:rsi]
mov dword [ds:rsi], eax
xor dword [ds:rdi], eax
pop rbp
ret

Ignoring the basic setup and teardown of each method, traditional_swap is fewer instructions and thus I would expect it to execute quicker (though I understand this is not neccesarily the always case).

traditional_swap consists of four movs, which I found interesting as I would logically expect 3 movs.

xor_swap consists of three sets of mov xor, which seems to make sense but I honestly don’t know too much assembler so I’ll have to trust the compiler on this one.

I wonder if either method could be improved with hand-written assembler, but perhaps I’ll leave that for another time.