August 26th, 2019

Data structures: Binary Search Trees

Binary search trees (BSTs) are the typical tree data structure, and are used for fast access to data for a range of operations. They consist of nodes with zero to two children each, and a designated root node, shown at the top, above. New nodes can be inserted continuously and removed while maintaining good performance properties for all operations. Occasionally a rebalancing of the tree is necessary, more about this later. The trees shown here are used to store integers up to 200. But in fact, any kind of data can be stored in the BST through reference, and the numbers which things are ordered by is called the ‘key’: it assigns an integer to every object in a node.

The trees shown on this page are limited in height for better display. The height is the maximum number of edges between the root and a leaf node. Real trees can become arbitrarily high.

The BST Invariant

The first step to understanding a new data structure is to know the main invariant, which has to be maintained between operations. For the BST it is defined per node: all values in the left subtree of a node have to be less than or equal to the value of the parent node, while the values in the right subtree of a node have to be larger than or equal to the value of the parent node. This has to be maintained for all nodes, subject only to exception for empty subtrees. Above we traverse the tree in order, visiting the entire left subtree of any node before visiting the parent and then the entire right subtree in order. This allows us to print the values in the tree in order.

Minimum and Maximum keys

The simplest operation on a BST is to find the smallest or largest entry respectively. For the former operation, simply follow the left child node pointer repeatedly, until there is no left child, which means the minimum value has been found. For the node with the maximum value, similarly follow the right child pointers repeatedly. This is displayed above for both minimum and maximum search.

Searching for arbitrary keys

Searching for an arbitrary key is similar to the previous operation of finding a minimum. Instead of always taking the left child pointer, the search has to choose between the left and right child and the attached subtree. If the desired key is less than the value of the current node, move to the left child node. If it is larger, simply move to the right child. If the value is equal to the sought key, the search terminates successfully at this present node. If the search ends at a node without an appropriate child node, the search terminates, failing to find the key.

Click on green node (left) to insert it into the tree

Inserting values

To insert a new value into the BST, we first find the right position for it. This is similar to the search for a key, discussed above. Upon finding a missing child node at the right position, simply add a new node to this parent. The case where the new key is already present in the tree is not a problem. We allow for duplicate entries, as the contents of e.g. the left subtree does not have to be strictly smaller than the parent node value, but can contain equal values just as well.

Click on any node in the tree to remove it

Deleting values

As above, to delete a node, we first find it in the tree, by search. If it has no children, being a so-called ‘leaf node’, we can simply remove it without further ado. If the node to be removed has one child node, we simply replace the node to be removed with the child at the same position. The hard part is the case where the node we want to remove has two child nodes. The procedure for that case is as follows: swap the positions of the removal node with it's predecessor according to the order of the BST. The predecessor will not have two children, so the removal node can be deleted from its new position using one of the two other cases above. You can try each of these cases by clicking to remove nodes above, and check whether the invariant is maintained after the operation.

Rebalancing trees

As you might have noticed by now, sometimes a binary tree becomes lopsided over time, like the one shown above, with all the nodes in the left or right subtree of the root. What the program can then do is called rebalancing. A node below the root is chosen to be a better root node than the current one. This is followed by a rotation of subtrees as shown above. We show both left and right rotations in this panel, but only execute one rotation at a time. I practice you might execute many rotations.

If you enjoyed this page, there are more algorithms and data structures to be found on the main page. In particular a similar tree structure is employed for the Heap.