March 2nd, 2019

Data structures: the Heap

This page is about a data structure called the ‘heap’ which can be used for sorting or as a ‘priority queue’. More about applications later. The main function of a heap is that you can cheaply remove the smallest element that is stored within it. Above you can see a heap in action with new elements being added continuously and later the smallest element is removed from the top. These are the two main operations of a heap. Other, less common operations include deleting an element from the heap, decreasing the value of a node or merging two heaps. We display the heap as a binary tree here, in which each node has up to two child nodes. But the heap does not have to be stored that way, and there will be more about this later.

Before we talk about the specific algorithms for inserting and removing elements, I want to point out the heap invariant. This is a certain order, which has to be maintained between operations.

Click on any node in the heap

The Heap Invariant

In each chain of elements from the root node to a leaf node, we require that a child node be larger or equal to its parent. This is required for any downward chain, but peer nodes need bear no special arrangement to each other. The smallest element is found at the top of the heap (there is the special case where there are several equally small elements). Any operation on the heap needs to maintain this property. This particular ordering is due to our heap being a ‘min-heap’. If it were a ‘max-heap’ we would require child elements to be no larger than their parent, with the largest element at the top.

Lets talk about inserting new elements into the heap.

Click on green node (left) to insert it into the heap

Inserting a node

The new element is initially placed in a node at the lowest level of the binary tree which is representing our heap, and at the rightmost end of that level. It then ‘bubbles up’. Bubbling up means that the new element is repeatedly compared to its parent, and swaps places with it, only if it is smaller than the parent. This comparing and swapping goes on until the parent is no larger than its child or the new node is at the top of the tree, and then the process of bubbling up is over. Note that the invariant is then maintained again with respect to the new node.

Click on root node (top) to remove it from the heap

Removing the root node

The next operation we shall look at, is removing the smallest element from the heap. As it is right at the top, initially we only have to take it away. But to maintain the heap invariant, we have to find a new smallest element to take its place. This is done by taking the last element at the bottom right, and placing it at the top, followed by ‘bubbling down’. For this, the new top node is compared to both children, and if one of them is smaller than the top, it is replaced with the smallest of the children. Ties can be broken randomly. Then, simply repeat the process with the same node that moved down, by comparing it again with its two child nodes. The process is over, when the node is smaller or equal to its children, or when it reaches the bottom level of the tree.

Storing the Heap in an Array

While the heap is best understood as a tree, it nicely fits into a linear array. Simply place each level of the tree, from top to bottom one after another, and each level left-to-right into an array. The next position for inserting an element at the end, is then simply the first unused entry in the array. The root node is simply the first entry in the array. Finding the parent node or the two child nodes of any given position is a straightforward calculation.

Application: Heapsort

There are many applications for the heap, for example storing paths found in a graph and extracting the so-far minimally distant one to try next. But the most obvious application is sorting numbers: put them all into the heap, and extract them one by one in the right order. Among the many efficient sorting algorithms, so-called heapsort is among the fastest. But there is a nice property that some sorting procedures have, which is the ability to do the sorting in-place, i.e. without extra storage like a second array to put the results in. Drawing on our observations from the last section, here is a method for sorting an array in-place using the heap data structure.

We start off with a list of random numbers and carry out a heapify operation, which turns the array into the now well-known heap shape. It does this by repeatedly bubbling the parent nodes down until our invariant is satisfied. Next the algorithm repeatedly extracts the root element and places it at the head of the formerly free space at the end of our array, which will then be successively filled with the sorted numbers. Note that by using a min-heap we ended up sorting the array in descending order. Using a max-heap we could have gotten an ascending sort.

Raison d’être: Performance

So what is the whole point of using the heap then, as opposed to a simple list or other data structure? The answer is the performance of the resulting structure in terms of time used for the main operations. When looking at the trees you might note that every level can accomodate twice as many nodes as the previous one. The required depth of the tree goes up logarithmically with the number of elements stored in the heap. This means it goes up rather slowly with size. And the time required for bubbling nodes up and down is a function of this depth of the tree. So no matter how many nodes are in the heap, the cost of inserting and removing elements is proportional to the logarithm of the size, i.e. only dependent on the order of magnitude of the number of elements in the heap.

I hope you enjoyed this animated explanation of the heap data structure, stay tuned for more to come on our main page.