 A simple shuffle that proved not so simple after all

Yesterday I found following snippet of Ruby magic to shuffle an array:

```class Array
def shuffle
sort { rand(3) - 1 }
end
end

arr = (1..10).to_a
# => [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

arr.shuffle
# => [1, 8, 6, 10, 9, 3, 7, 2, 5, 4]

arr.shuffle
# => [3, 7, 10, 4, 5, 8, 2, 6, 9, 1]```

There is one thing wrong with it, but the original site no longer permits commenting on this entry, so I thought I’d share my thoughts here.

Of course, the above solution works OK: it returns a nicely shuffled array every time it is called. Its succinctness is also admirable. Using `sort` to actually shuffle is also a clever hack.

But does it work in an efficient way? No. Using `sort` means that complexity of our `shuffle` method is O(n log n), just as sorting. Is this a bad thing? No, as long as you shuffle only small arrays. But as soon as you use it to shuffle something huge, the performance will suffer.

On “Big O Notation” — a small interruption

If you wonder what the hell is this O(n log n), you can check the Wikipedia entry on Big O Notation, but it’s very long and theoretical, so here is a short and hopefully easier explanation.

The Big O Notation is a tool used to analyze algorithms and estimate their efficiency, i.e. how fast they solve given problem. Thus you can compare e.g. various array sorting algorithms. To find the complexity of an algorithm, we must find out, or estimate, how many steps (operations that we can assume take roughly unit time) does it take to solve problem as a function of problem size, usually called n.

So, saying “algorithm `A` is O(n log n),  but algorithm `B` is O(n2)” means that when sorting an array of size n, algorithm `A` needs an amount of steps proportional to n long n, and `B` needs an amount of steps proportional to n2.

But beware, “Big O” won’t give you measurements in seconds. The right question to ask is: “If n (the number of elements in sorted array) would increase, how much more time would the algorithm need?” This gives us a useful estimation: if we tried to sort a 10 times bigger array, algorithm `B` would need 100 times more time!

A hidden constant

There is one more important thing about these estimations: a hidden constant. As you may have noticed, the above paragraph doesn’t directly compare algorithms `A` and `B`. All we can estimate, is how fast will execution time grow of individual algorithm, given some increase of input size. The real execution time is dependent on so many factors (like processor speed, operating system resources, programming language, quality of algorithm’s actual implementation) that it cannot be reliably used to compare algorithms. So all those factors are “shoved under the carpet” in the form of a hidden constant.

If we assume that input size grows sufficiently large, the impact of a hidden constant (no matter how big) can be safely ignored. How is that? Here’s some example: let’s assume that hidden constant of `A`, C`A` = 1000, because the algorithm runs on an ancient computer and was written not very efficiently in some slow language. On the other hand, we assume that C`B` = 1, because it runs on a fast computer and was carefully hand-coded in assembly.

So, for n = 10, algorithm `A` needs 1000 * 10 * log 10 = 40,000 steps, and algorithm `B` needs 1 * 10 * 10 = 100 steps. If we assume that completing 1000 steps takes 1 second, the execution times will be 40s and 0.1s, respectively. Looks like `B` is the winner? Wait a minute. Let’s see what will happen, when the value of n increases:

n time of `A` (C`A` * n log n * 0.001s) time of `B` (C`B` * n2 * 0.001s)
101 = 10 40s 0.1s
102 = 100 700s 10s
103 = 1,000 10,000s 1,000s
104 = 10,000 140,000s 100,000s
105 = 100,000 1,700,000s 10,000,000s
106 = 1,000,000 20,000,000s 1,000,000,000s
107 = 10,000,000 240,000,000s 100,000,000,000s

As we can see, for 100,000 elements and more our rusty, ancient computer starts to win with our fast and modern one. And the difference for 10,000,000 (ten million) elements is truly devastating: 7.6 years versus 3171 years! Three thousand years!

Back to shuffling

As I already said, O(n log n) is not the best complexity for such a simple algorithm like shuffling an array. Intuitively, the number of steps should be directly proportional to n, or — in Big O Notation — O(n). This is often called linear complexity.

So we need to find a linear shuffle — an algorithm that shuffles the array in one pass. Strictly speaking, it can do so in any number of passes, even 1000, as long as this number is constant.

What’s really funny is that many people try to find faster and faster implementations of
shuffling, but as long as they are using `sort`, their solutions will be flawed, in terms of complexity. We can see that in a comment on the site. Also, most of the discussion on comp.lang.ruby was focused on finding fastest shuffling-by-sorting implementation. There was not a single post directly mentioning algorithm complexity. At least some people proposed linear shuffle algorithms, but several of them had another problem: they were biased (which means that they weren’t truly random).

Another useful finding in the newsgroup discussion is following link: http://c2.com/cgi/wiki?LinearShuffle. You should certainly pay a visit to this site if you want to have a better understanding of various way to do the linear shuffle and how to avoid biased results. It has many examples of code in many languages, but sadly no Ruby version, so we’ll try to create one here.

We want to have a method that’s both short and efficient. So, I created following method using slightly modified version of one of the algorithms from the before mentioned Wiki page:

```class Array
def shuffle
arr = self.dup
arr.size.downto 2 do |j|
r = rand j
arr[j-1], arr[r] = arr[r], arr[j-1]
end
arr
end
end```

Admittedly, it’s not as short as the `sort`-version, but is it faster? There is only one loop in this version, which runs through the array (backwards, from last element to second) and swaps current element with a random one that’s before it. First element is skipped because there is nothing to swap it with. Our version has linear complexity, so it should be faster for sufficiently big values of n. Let’s find this n.

Benchmarking

Here is my benchmarking code, borrowed from the newsgroup, but with a few enhancements:

```require 'benchmark'

class Array
def shuffle
arr = self.dup
arr.size.downto 2 do |j|
r = rand j
arr[j-1], arr[r] = arr[r], arr[j-1]
end
arr
end
def shuffle_sort
sort {rand(3) - 1}
end
def shuffle_sort_by
sort_by {rand}
end
end

def gc
GC.enable
GC.start
GC.disable
end

[10, 100, 1000, 10000, 100000, 1000000].each do |n|
a = Array.new(n) { |i| i }
k = 1000000 / n
Benchmark.bm(25) do |bb|
gc
bb.report("shuffle ("+n.to_s+")         ") do
k.times do
a.shuffle
end
end
gc
bb.report("shuffle_sort ("+n.to_s+")    ") do
k.times do
a.shuffle_sort
end
end
gc
bb.report("shuffle_sort_by ("+n.to_s+") ") do
k.times do
a.shuffle_sort_by
end
end
end
end```

First comes `Array` class, enhanced with a three new methods:

• `shuffle` is my implementation of linear shuffle,
• `shuffle_sort` is taken from above mentioned snippet,
• `shuffle_sort_by` is probably the fastest implementation of sort based shuffle, it was proposed several times in newsgroup discussion.

Next is a little utility function `gc` that runs garbage collector and then disables it. We’ll call it between the benchmarked methods. This should presumably minimize their interference on each other and also prevent garbage collector from impacting our results.

Then comes the real benchmarking code. We try consecutive powers of ten as array size `n`, from 10 to 1,000,000. For each `n`, we choose `k` as the number of times the shuffle method is called, such that `n*k = 1000000`. This means that array of 10 elements will be shuffled 100,000 times, but array of 1,000,000 only one time. Variable `k` acts as a normalization factor: if we chose a constant number instead (like 10 or 1,000,000), the measurements would either be fractions of seconds for small arrays or days for huge ones.

Results

Here are the results from my computer:

```                               user     system      total        real
shuffle (10)               3.100000   0.360000   3.460000 (  3.548542)
shuffle_sort (10)          1.470000   0.010000   1.480000 (  1.500005)
shuffle_sort_by (10)       0.890000   0.000000   0.890000 (  0.892586)
user     system      total        real
shuffle (100)              2.960000   0.030000   2.990000 (  3.030168)
shuffle_sort (100)         3.730000   0.100000   3.830000 (  3.880254)
shuffle_sort_by (100)      1.180000   0.000000   1.180000 (  1.204529)
user     system      total        real
shuffle (1000)             2.930000   0.020000   2.950000 (  3.046020)
shuffle_sort (1000)        6.240000   0.200000   6.440000 (  6.553730)
shuffle_sort_by (1000)     1.620000   0.010000   1.630000 (  1.637530)
user     system      total        real
shuffle (10000)            2.940000   0.020000   2.960000 (  3.026883)
shuffle_sort (10000)       8.810000   0.380000   9.190000 (  9.353987)
shuffle_sort_by (10000)    2.040000   0.010000   2.050000 (  2.143209)
user     system      total        real
shuffle (100000)           3.040000   0.010000   3.050000 (  3.124412)
shuffle_sort (100000)     11.290000   0.120000  11.410000 ( 11.649619)
shuffle_sort_by (100000)   2.710000   0.020000   2.730000 (  2.800463)
user     system      total        real
shuffle (1000000)           3.090000   0.010000   3.100000 (  3.153007)
shuffle_sort (1000000)     13.870000   0.140000  14.010000 ( 14.473133)
shuffle_sort_by (1000000)   3.730000   0.030000   3.760000 (  3.857054)```

Interpretation

We can notice several things in these results:

1. `shuffle` method takes always about 3s to run, while other methods take more and more time. This proves that `shuffle` has linear complexity and others have not.
2. `shuffle_sort_by` is about 4 times faster than `shuffle_sort` for bigger arrays. This proves that they both have the same complexity (in terms of Big O Notation) and differ only by a hidden constant.
3. `shuffle` is faster than `shuffle_sort` for 100 elements and more.
4. `shuffle` is faster than `shuffle_sort_by` for 1,000,000 elements and more. The difference for 1,000,000 is quite big, so I experimented a little bit and found out that the “crossing point” is about 350,000 on my computer. For any array size greater than 350,000, `shuffle` is the fastest method.

Conclusion

So, which method is the best? If you know that you will never ever shuffle anything bigger than a few hundred elements, use `shuffle_sort_by`. In any other case (especially in a library code etc.) I would recommend playing safe and using `shuffle`. Even if you never throw at it anything huge, I doubt this method will be a bottleneck of your code. And as always, before you start optimizing through random tweaking, I recommend some profiling sessions or (at the very least) a benchmarking suite like the one above.

One could say that 350,000 (or whatever number it is on your computer) is quite a big number, especially for an array size. That’s true, but let’s not forget, that Ruby’s `sort` and `sort_by` are written in C and use its standard library `qsort` function, which is probably highly optimized, while our own `shuffle` is pure Ruby code. If we implemented our method in C, it would be probably fastest from the start (and I would certainly do such an experiment if this blog entry weren’t so long already).

And that’s why `sort`-version is four times slower than `sort_by`-version. The former has to go back from C to Ruby each time it has to compare two elements, the latter just calls its block once for every element and then stores the result.

Just one more thing

While writing this article I experimented with various versions of the methods and discovered one more flaw in the original version (renamed by me to `shuffle_sort`). I was wrong when I said “it returns a nicely shuffled array every time it is called.”

The results are not nicely shuffled at all. They are biased. Badly. That means that some permutations (i.e. orderings) of elements are more likely than others. Here’s another snippet of code to prove it, again borrowed from the newsgroup discussion:

```N = 100000
A = %w(a b c)
Score = Hash.new { |h, k| h[k] = 0 }
N.times do
sorted = A.shuffle
Score[sorted.join("")] += 1
end

Score.keys.sort.each do |key|
puts "#{key}: #{Score[key]}"
end```

This code shuffles 100,000 times array of three elements: `a`, `b`, `c` and records how many times each possible result was achieved. In this case, there are only six possible orderings and we should got each one about 16666.66 times. If we try an unbiased version of shuffle (`shuffle` or `shuffle_sort_by`), the result are as expected:

```abc: 16517
acb: 16893
bac: 16584
bca: 16568
cab: 16476
cba: 16962```

Of course, there are some deviations, but they shouldn’t exceed a few percent of expected value and they should be different each time we run this code. We can say that the distribution is even.

OK, what happens if we use the `shuffle_sort` method?

```abc: 44278
acb: 7462
bac: 7538
bca: 3710
cab: 3698
cba: 33314```

This is not an even distribution at all. Again?

```abc: 44444
acb: 7494
bac: 7320
bca: 3683
cab: 3611
cba: 33448```

As we can clearly see, the `shuffle_sort` method favorizes `abc` ordering (44% of results), then `cba` (33%), then `acb` and `bac` (about 7%), then `bca` and `cab` (less than 4%). The results for other array sizes are biased as well (I checked from 2 to 5). Imagine this algorithm used in a lottery!

10 responses to “A simple shuffle that proved not so simple after all”

• Jetru

Superb! One of the better blogs i’ve come across! I’m a regular! :) And awesome site design :D

• Tarun

Sir, Please tell me how do we prove that one algorithm is faster than the other by finding out the hidden constants

• szeryf

Tarun: I’m sorry but you must have got that entirely backwards. The hidden constant is hidden because it doesn’t matter when you compare two algorithms. You don’t have to know how big it is.

Consider two functions: F(n) = A * n and G(n) = B * n * n, where A and B could be arbitrarily large, but constant numbers. Do you need to know A and B values to tell that given n big enough, G(n) > F(n)?

Or, more formally, we can easily prove that there is some N that for every n > N, G(n) > F(n) holds. Which means that it holds for “almost all” numbers. And this is enough for algorithmic theory to say “G is slower(bigger) than F” because the theory compares their asymptotic complexities, that is where n increases to infinity.

• Daniel Bernier

Very nice explanation & illustration of Big O notation…those benchmark numbers lay it out pretty clearly.

delicious’ed.

• not-just-yeti

Maybe I’m an idiot, but is it obvious that
arr.sort{rand(3)-1}
really gives a uniform distribution? It seems to depend on the details of the sort.

(For example, if we were to use bubblesort, the probability that the first element ends up last is only 1/2^n, instead of 1/n. So for quicksort, I need some convincing that the results really are uniform.)

(Also, at the very least, that cute-but-inefficient version would want to use
arr.sort{ 2*rand(2)-1 }
)

• szeryf

not-just-yeti: it doesn’t. If you read the last part of this article (under “just one more thing”) you’ll see that I show that the results are not uniform.

Also see http://www.codinghorror.com/blog/archives/001015.html if you want more proof.

• not-just-yeti

Ah, thanks.

After a bunch of thinking, it can be proved that *no* sort algorithm will make shuffle_sort work (meaning that this isn’t just the details of quicksort that muck things up; it won’t work for mergesort or anything else):
If the sort algorithm makes k comparisons (i.e. k calls to rand), then there are 2^k possible executions, which can’t be *uniformly* mapped onto the n! possible shuffled outputs.

(As I write that, I realize it’s the same idea made about a non-sorting algorithm, on the post you linked to.)

• Student

Style cop!

```class Array   def shuffle!     (1...size).each do |j|       r = rand(j+1)       self[r], self[j] = at(j), at(r)     end     self   end   def shuffle     dup.shuffle!   end end ```
`at` is supposed to be a bit faster than `[]`, but it is rhs only.
minimize the arithmetic inside the loop.
minimize the arithmetic outside the loop.
allow the user to decide if he wants to pay for the `dup` or not.