Yesterday I found following snippet of Ruby magic to shuffle an array:

class Array def shuffle sort { rand(3) - 1 } end end arr = (1..10).to_a # => [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] arr.shuffle # => [1, 8, 6, 10, 9, 3, 7, 2, 5, 4] arr.shuffle # => [3, 7, 10, 4, 5, 8, 2, 6, 9, 1]

There is one thing wrong with it, but the original site no longer permits commenting on this entry, so I thought I’d share my thoughts here.

Of course, the above solution works OK: it returns a nicely shuffled array every time it is called. Its succinctness is also admirable. Using `sort`

to actually shuffle is also a clever hack.

But does it work in an efficient way? No. Using `sort`

means that complexity of our `shuffle`

method is *O(n log n)*, just as sorting. Is this a bad thing? No, as long as you shuffle only small arrays. But as soon as you use it to shuffle something huge, the performance will suffer.

# On “Big O Notation” — a small interruption

If you wonder what the hell is this *O(n log n)*, you can check the Wikipedia entry on Big O Notation, but it’s very long and theoretical, so here is a short and hopefully easier explanation.

The Big O Notation is a tool used to analyze algorithms and estimate their efficiency, i.e. how fast they solve given problem. Thus you can compare e.g. various array sorting algorithms. To find the complexity of an algorithm, we must find out, or estimate, how many steps (operations that we can assume take roughly unit time) does it take to solve problem as a function of problem size, usually called *n*.

So, saying “algorithm `A`

is *O(n log n)*, but algorithm `B`

is *O(n ^{2})*” means that when sorting an array of size

*n*, algorithm

`A`

needs an amount of steps proportional to *n long n*, and

`B`

needs an amount of steps proportional to *n*.

^{2}But beware, “Big O” won’t give you measurements in seconds. The right question to ask is: “If *n* (the number of elements in sorted array) would increase, how much more time would the algorithm need?” This gives us a useful estimation: if we tried to sort a 10 times bigger array, algorithm `B`

would need 100 times more time!

# A hidden constant

There is one more important thing about these estimations: a **hidden constant**. As you may have noticed, the above paragraph doesn’t directly compare algorithms `A`

and `B`

. All we can estimate, is how fast will execution time grow of individual algorithm, given some increase of input size. The real execution time is dependent on so many factors (like processor speed, operating system resources, programming language, quality of algorithm’s actual implementation) that it cannot be reliably used to compare algorithms. So all those factors are “shoved under the carpet” in the form of a hidden constant.

If we assume that input size grows sufficiently large, the impact of a hidden constant (no matter how big) can be safely ignored. How is that? Here’s some example: let’s assume that hidden constant of `A`

, *C*_{A} = 1000, because the algorithm runs on an ancient computer and was written not very efficiently in some slow language. On the other hand, we assume that *C*_{B} = 1, because it runs on a fast computer and was carefully hand-coded in assembly.

So, for *n* = 10, algorithm `A`

needs 1000 * 10 * log 10 = 40,000 steps, and algorithm `B`

needs 1 * 10 * 10 = 100 steps. If we assume that completing 1000 steps takes 1 second, the execution times will be 40s and 0.1s, respectively. Looks like `B`

is the winner? Wait a minute. Let’s see what will happen, when the value of *n* increases:

n |
time of `A` (Cs)_{A} * n log n * 0.001 |
time of `B` (Cs)_{B} * n^{2} * 0.001 |
---|---|---|

10^{1} = 10 |
40s | 0.1s |

10^{2} = 100 |
700s | 10s |

10^{3} = 1,000 |
10,000s | 1,000s |

10^{4} = 10,000 |
140,000s | 100,000s |

10^{5} = 100,000 |
1,700,000s | 10,000,000s |

10^{6} = 1,000,000 |
20,000,000s | 1,000,000,000s |

10^{7} = 10,000,000 |
240,000,000s | 100,000,000,000s |

As we can see, for 100,000 elements and more our rusty, ancient computer starts to win with our fast and modern one. And the difference for 10,000,000 (ten million) elements is truly devastating: 7.6 years versus 3171 years! Three thousand years!

# Back to shuffling

As I already said, *O(n log n)* is not the best complexity for such a simple algorithm like shuffling an array. Intuitively, the number of steps should be directly proportional to *n*, or — in Big O Notation — *O(n)*. This is often called *linear complexity*.

So we need to find a *linear shuffle* — an algorithm that shuffles the array in one pass. Strictly speaking, it can do so in any number of passes, even 1000, as long as this number is constant.

What’s really funny is that many people try to find faster and faster implementations of

shuffling, but as long as they are using `sort`

, their solutions will be flawed, in terms of complexity. We can see that in a comment on the site. Also, most of the discussion on comp.lang.ruby was focused on finding fastest shuffling-by-sorting implementation. There was not a single post directly mentioning algorithm complexity. At least some people proposed linear shuffle algorithms, but several of them had another problem: they were biased (which means that they weren’t truly random).

Another useful finding in the newsgroup discussion is following link: http://c2.com/cgi/wiki?LinearShuffle. You should certainly pay a visit to this site if you want to have a better understanding of various way to do the linear shuffle and how to avoid biased results. It has many examples of code in many languages, but sadly no Ruby version, so we’ll try to create one here.

We want to have a method that’s both short and efficient. So, I created following method using slightly modified version of one of the algorithms from the before mentioned Wiki page:

class Array def shuffle arr = self.dup arr.size.downto 2 do |j| r = rand j arr[j-1], arr[r] = arr[r], arr[j-1] end arr end end

Admittedly, it’s not as short as the `sort`

-version, but is it faster? There is only one loop in this version, which runs through the array (backwards, from last element to second) and swaps current element with a random one that’s before it. First element is skipped because there is nothing to swap it with. Our version has linear complexity, so it *should* be faster *for sufficiently big values of n*. Let’s find this *n*.

# Benchmarking

Here is my benchmarking code, borrowed from the newsgroup, but with a few enhancements:

require 'benchmark' class Array def shuffle arr = self.dup arr.size.downto 2 do |j| r = rand j arr[j-1], arr[r] = arr[r], arr[j-1] end arr end def shuffle_sort sort {rand(3) - 1} end def shuffle_sort_by sort_by {rand} end end def gc GC.enable GC.start GC.disable end [10, 100, 1000, 10000, 100000, 1000000].each do |n| a = Array.new(n) { |i| i } k = 1000000 / n Benchmark.bm(25) do |bb| gc bb.report("shuffle ("+n.to_s+") ") do k.times do a.shuffle end end gc bb.report("shuffle_sort ("+n.to_s+") ") do k.times do a.shuffle_sort end end gc bb.report("shuffle_sort_by ("+n.to_s+") ") do k.times do a.shuffle_sort_by end end end end

First comes `Array`

class, enhanced with a three new methods:

`shuffle`

is my implementation of linear shuffle,`shuffle_sort`

is taken from above mentioned snippet,`shuffle_sort_by`

is probably the fastest implementation of sort based shuffle, it was proposed several times in newsgroup discussion.

Next is a little utility function `gc`

that runs garbage collector and then disables it. We’ll call it between the benchmarked methods. This should presumably minimize their interference on each other and also prevent garbage collector from impacting our results.

Then comes the real benchmarking code. We try consecutive powers of ten as array size `n`

, from 10 to 1,000,000. For each `n`

, we choose `k`

as the number of times the shuffle method is called, such that `n*k = 1000000`

. This means that array of 10 elements will be shuffled 100,000 times, but array of 1,000,000 only one time. Variable `k`

acts as a normalization factor: if we chose a constant number instead (like 10 or 1,000,000), the measurements would either be fractions of seconds for small arrays or days for huge ones.

# Results

Here are the results from my computer:

user system total real shuffle (10) 3.100000 0.360000 3.460000 ( 3.548542) shuffle_sort (10) 1.470000 0.010000 1.480000 ( 1.500005) shuffle_sort_by (10) 0.890000 0.000000 0.890000 ( 0.892586) user system total real shuffle (100) 2.960000 0.030000 2.990000 ( 3.030168) shuffle_sort (100) 3.730000 0.100000 3.830000 ( 3.880254) shuffle_sort_by (100) 1.180000 0.000000 1.180000 ( 1.204529) user system total real shuffle (1000) 2.930000 0.020000 2.950000 ( 3.046020) shuffle_sort (1000) 6.240000 0.200000 6.440000 ( 6.553730) shuffle_sort_by (1000) 1.620000 0.010000 1.630000 ( 1.637530) user system total real shuffle (10000) 2.940000 0.020000 2.960000 ( 3.026883) shuffle_sort (10000) 8.810000 0.380000 9.190000 ( 9.353987) shuffle_sort_by (10000) 2.040000 0.010000 2.050000 ( 2.143209) user system total real shuffle (100000) 3.040000 0.010000 3.050000 ( 3.124412) shuffle_sort (100000) 11.290000 0.120000 11.410000 ( 11.649619) shuffle_sort_by (100000) 2.710000 0.020000 2.730000 ( 2.800463) user system total real shuffle (1000000) 3.090000 0.010000 3.100000 ( 3.153007) shuffle_sort (1000000) 13.870000 0.140000 14.010000 ( 14.473133) shuffle_sort_by (1000000) 3.730000 0.030000 3.760000 ( 3.857054)

# Interpretation

We can notice several things in these results:

`shuffle`

method takes always about 3s to run, while other methods take more and more time. This proves that`shuffle`

has linear complexity and others have not.`shuffle_sort_by`

is about 4 times faster than`shuffle_sort`

for bigger arrays. This proves that they both have the same complexity (in terms of Big O Notation) and differ only by a hidden constant.`shuffle`

is faster than`shuffle_sort`

for 100 elements and more.`shuffle`

is faster than`shuffle_sort_by`

for 1,000,000 elements and more. The difference for 1,000,000 is quite big, so I experimented a little bit and found out that the “crossing point” is about 350,000 on my computer. For any array size greater than 350,000,`shuffle`

is the fastest method.

# Conclusion

So, which method is the best? If you **know** that you will never ever shuffle anything bigger than a few hundred elements, use `shuffle_sort_by`

. In any other case (especially in a library code etc.) I would recommend playing safe and using `shuffle`

. Even if you never throw at it anything huge, I doubt this method will be a bottleneck of your code. And as always, before you start optimizing through random tweaking, I recommend some profiling sessions or (at the very least) a benchmarking suite like the one above.

One could say that 350,000 (or whatever number it is on your computer) is quite a big number, especially for an array size. That’s true, but let’s not forget, that Ruby’s `sort`

and `sort_by`

are written in C and use its standard library `qsort`

function, which is probably highly optimized, while our own `shuffle`

is pure Ruby code. If we implemented our method in C, it would be probably fastest from the start (and I would certainly do such an experiment if this blog entry weren’t so long already).

And that’s why `sort`

-version is four times slower than `sort_by`

-version. The former has to go back from C to Ruby each time it has to compare two elements, the latter just calls its block once for every element and then stores the result.

# Just one more thing

While writing this article I experimented with various versions of the methods and discovered one more flaw in the original version (renamed by me to `shuffle_sort`

). I was wrong when I said “it returns a nicely shuffled array every time it is called.”

The results are not nicely shuffled at all. They are biased. Badly. That means that some permutations (i.e. orderings) of elements are more likely than others. Here’s another snippet of code to prove it, again borrowed from the newsgroup discussion:

N = 100000 A = %w(a b c) Score = Hash.new { |h, k| h[k] = 0 } N.times do sorted = A.shuffle Score[sorted.join("")] += 1 end Score.keys.sort.each do |key| puts "#{key}: #{Score[key]}" end

This code shuffles 100,000 times array of three elements: `a`

, `b`

, `c`

and records how many times each possible result was achieved. In this case, there are only six possible orderings and we should got each one about 16666.66 times. If we try an unbiased version of shuffle (`shuffle`

or `shuffle_sort_by`

), the result are as expected:

abc: 16517 acb: 16893 bac: 16584 bca: 16568 cab: 16476 cba: 16962

Of course, there are some deviations, but they shouldn’t exceed a few percent of expected value and they should be different each time we run this code. We can say that the distribution is even.

OK, what happens if we use the `shuffle_sort`

method?

abc: 44278 acb: 7462 bac: 7538 bca: 3710 cab: 3698 cba: 33314

This is not an even distribution at all. Again?

abc: 44444 acb: 7494 bac: 7320 bca: 3683 cab: 3611 cba: 33448

As we can clearly see, the `shuffle_sort`

method favorizes `abc`

ordering (44% of results), then `cba`

(33%), then `acb`

and `bac`

(about 7%), then `bca`

and `cab`

(less than 4%). The results for other array sizes are biased as well (I checked from 2 to 5). Imagine this algorithm used in a lottery!

December 20th, 2007 at 22:01:17

Superb! One of the better blogs i’ve come across! I’m a regular! :) And awesome site design :D

December 23rd, 2007 at 21:02:46

Cool topic! ;)

January 5th, 2008 at 03:32:39

Sir, Please tell me how do we prove that one algorithm is faster than the other by finding out the hidden constants

January 5th, 2008 at 12:21:20

Tarun: I’m sorry but you must have got that entirely backwards. The hidden constant is hidden because it doesn’t matter when you compare two algorithms. You don’t have to know how big it is.

Consider two functions: F(n) = A * n and G(n) = B * n * n, where A and B could be arbitrarily large, but constant numbers. Do you need to know A and B values to tell that given n big enough, G(n) > F(n)?

Or, more formally, we can easily prove that there is some N that for every n > N, G(n) > F(n) holds. Which means that it holds for “almost all” numbers. And this is enough for algorithmic theory to say “G is slower(bigger) than F” because the theory compares their asymptotic complexities, that is where n increases to infinity.

January 10th, 2008 at 20:30:00

Very nice explanation & illustration of Big O notation…those benchmark numbers lay it out pretty clearly.

delicious’ed.

March 21st, 2008 at 12:18:03

[…] Update and this is too! […]

March 21st, 2008 at 23:18:09

Maybe I’m an idiot, but is it obvious that

arr.sort{rand(3)-1}

really gives a uniform distribution? It seems to depend on the details of the sort.

(For example, if we were to use bubblesort, the probability that the first element ends up last is only 1/2^n, instead of 1/n. So for quicksort, I need some convincing that the results really are uniform.)

(Also, at the very least, that cute-but-inefficient version would want to use

arr.sort{ 2*rand(2)-1 }

)

March 21st, 2008 at 23:25:36

not-just-yeti: it doesn’t. If you read the last part of this article (under “just one more thing”) you’ll see that I show that the results are not uniform.

Also see http://www.codinghorror.com/blog/archives/001015.html if you want more proof.

March 22nd, 2008 at 13:46:40

Ah, thanks.

After a bunch of thinking, it can be proved that *no* sort algorithm will make shuffle_sort work (meaning that this isn’t just the details of quicksort that muck things up; it won’t work for mergesort or anything else):

If the sort algorithm makes k comparisons (i.e. k calls to rand), then there are 2^k possible executions, which can’t be *uniformly* mapped onto the n! possible shuffled outputs.

(As I write that, I realize it’s the same idea made about a non-sorting algorithm, on the post you linked to.)

January 15th, 2009 at 06:04:08

Style cop!

`class Array`

def shuffle!

(1...size).each do |j|

r = rand(j+1)

self[r], self[j] = at(j), at(r)

end

self

end

def shuffle

dup.shuffle!

end

end

`at`

is supposed to be a bit faster than`[]`

, but it is rhs only.minimize the arithmetic inside the loop.

minimize the arithmetic outside the loop.

allow the user to decide if he wants to pay for the

`dup`

or not.