24 Jun 2020
Click here to see sorted arrayClick here to load new random array 

Algorithm: Timsort

Timsort is a hybrid sorting algorithm designed by Tim Peters for the Python language, but now also used in Java and Android, amongst others. Hybrid means that multiple sorting techniques are blended into one algorithm for good real-world performance. The main ingredients are Insertion sort and Mergesort. But Timsort also works with the technique of identifying pre-existing runs in the input data.

The panel above shows a random array with runs highlighted in colour. We have tweaked the random array generator a little, to have more and longer runs present than in purely random data. Runs are existing parts of the array that are already sorted as desired. Ascending runs are shown in green and descending runs are shown in red. Such runs are often present in arrays to be sorted in practice, depending on the type of data. In some cases only a few individual values are not yet sorted to the right position.


slowerplaypausefaster

Finding runs

This panel shows how the runs are identified. This is a pretty straightforward step: simply compare the next element to the previous one and see if it is part of an existing run, or whether the run is over. The bold bars superimposed on the array mark the boundaries of runs.


slowerplaypausefaster

Reversing runs

Once the runs are identified, there is another simple and fast step to perform: reverse all those runs that are counter to the desired sort order, in this case we want to reverse the descending runs. The panel above shows an array with the runs already identified, and proceeds to reverse the descending runs. Note that in descending runs, successive but equal values can't be part of the same run. This is the first step in the algorithm that makes Timsort ‘stable’, meaning that a pre-existing ordering is maintained for equal values. As descending runs are reversed, equal values can't be in the same run.


Ready to sortFinding runsReversing runsMerging runsSorting complete 
 
slowerplaypausefaster

Natural Mergesort

At this point it is possible to sort the entire array, without any further Timsort magic, just by merging the unsorted portions of the array with the runs. We call this ‘natural Mergesort’. This is displayed on this panel, starting with an unsorted array, finding and possibly reversing runs, all the way to merging the runs to obtain a sorted array. But this was just an aside, and on the following panels we will continue with the Timsort strategy from after reversing our runs.


Ready to sortFinding runReversing runInsertion sort on runSteps complete 
 
slowerplaypausefaster

Minrun

Timsort has a parameter called ‘minrun’. This is the minimum size a run should have before moving on to the next stage of merging runs. This size requirement is achieved by running Insertion sort on a run, adding following input elements to a run, until the minimum size is reached. This is very efficient, because Insertion sort is fast for small input sizes.

Minrun is chosen as a function of the input size, at the start of the algorithm, and is at least 32. Here we have lowered this value to 11, in order to illustrate the operation of Timsort on the small arrays we can display here. The last run may of course have fewer elements than minrun.


Ready to sortFinding runReversing runInsertion sort on runMerging highMerging lowSort complete 
 
slowerplaypausefaster

Timsort

Here we display the entire operation of Timsort on a larger array of 71 values, for better illustration. After finding and possibly reversing runs, and using Insertion sort to increase their size to at least minrun (= 11), the final step of Timsort is carried out: merging. A stack, shown here at the bottom right, keeps track of the positions of merged runs, in the order they are present in the array. As new runs are discovered, they are added to the top of the stack.

There are two main invariants that are maintained by Timsort: the sum of the lengths of the last two runs has to be smaller than the one previous to them, and the length of the last entry has to be smaller than its predecessor. When one of the invariants is violated, this leads to neighboring runs being merged into one, as in Mergesort. The stack is used to keep track of all this. To merge, the smaller of the two runs is temporarily moved into temp space, shown here at the bottom left. Depending on the position of the free space, resulting from using the temp space, Timsort merges either ‘low’ or ‘high’, both of which can be observed on this panel with enough patience.


Ready to sortFinding runReversing runInsertion sort on runMerging highMerging lowSort complete 
 
slowerplaypausefaster

Sorting an interesting array

This panel shows the entire Timsort operation as on the previous panel, but on a larger array of 128 values, and at a faster speed. Minrun was set to 8 for illustration. Here we have generated input data which illustrates the complexity of merging natural runs better. Using the merge stack makes sure that runs of similar size are merged with preference before the resulting run is merged with larger runs further up. This yields better performance.

So why is Timsort fast? First off, very small input arrays are simply insertion sorted, which is efficient for small sizes. Next, existing runs in the input are identified and used, all the way up to the case where the input is already sorted and needs no modification. This adds up quite fast for the handling of many cases. Finally, Mergesort is aymptotically fast for large input arrays, which is what Timsort reduces to in the case of large, random input.


Ready to sortFinding runReversing runInsertion sort on runMerging highMerging lowSort complete 
 
slowerplaypausefaster

Presorted input

Finally, this panel shows the example of running Timsort on partially presorted input. The sorted array above has had a small number of random values concatenated to its end, and requires sorting again. This could be easily handled by plain Insertion sort, but the more general Timsort recognises the situation by identifying the main run and handles the input pretty efficiently.

I just want to mention how the animated implementation shown here deviates from the Python original. I have already spoken about the minrun parameter, which was artificially lowered. In addition, the Insertion sort used in the original is so-called ‘binary’ Insertion sort, here we have used a simpler swap-based variant. Next, the original code has a ‘galloping mode’ during merging, which we have not shown here at all. Galloping is used when the next element of one run is not taken until many elements from the other run have been transferred, to speed things up.


If you speak the language C you can have a look at the original implementation for Python here (starting at line 2162).

Sources