10 May 2025

Tries

On other sub-pages here we have shown trees as a collection of orderable numbers, like heaps and search trees. On this page we will introduce tries, which are trees of letters in strings. Abstractly, a trie is a way to store string keys efficiently and associate them with values, in this case numbers.

On the panel above, we see a small trie derived from the words of the sentence “how much wood would a woodchuck chuck,” by inserting each word separately and associating it with its position in the sentence. Small nodes contain a letter within strings, but larger nodes additionally contain a value, whose presence also indicates termination of a possible key.


 

slowerplaypausefaster

Looking up keys

On this panel we have used the phrase “you know new york you need new york you know you need unique new york.” Here we show what happens as we traverse the trie looking up present (green) or absent (red) keys. If the letters of the key terminate on a node with a value (larger nodes), the lookup was successful. Other cases like no value present or not all letters possible indicate the key is not in the trie.


 

slowerplaypausefaster

Inserting keys

On the diagram above we start with an empty trie and successively insert the words in the sentence “she sells sea shells by the sea shore.” The algorithm mixes steps of finding positions in the trie and inserting chains of letters to complete a key word.


 

slowerplaypausefaster

Finding all keys with a given prefix

Tries permit more advanced search operations than just looking up a key. Here we are looking up all keys that possibly start with a given prefix. First the prefix has to be found in the trie and then all nodes beneath the prefix node have to be searched. If no such keys exist the algorithm reports this too.

An application of prefix search is command auto-completion. After entering the first few letters of a command, an algorithm suggests possible completions from a trie.


 

slowerplaypausefaster

Wildcard matching in a trie

Another advanced search operation on a trie is wildcard search, in which some positions in the lookup string can match any letter in the trie. We show this operation on this panel, where the lookup strings can contain a dot (•) for a wildcard. During the traversal of the trie, a yellow colored node indicates match of the wildcard. Again the result of such a lookup can be multiple keys which all match the input.



A gene code trie

A trie with a substantial amount of text from the Latin alphabet is hard to display on a screen like this, because the branch-out factor under each node ranges up to 26 children, the number of letters in the alphabet. Instead we resort to a smaller alphabet here, the genetic code which has only four letters: A, C, G and T. Even so, as you can see above, the trie uses up horizontal space pretty fast. Here we have inserted three letter long gene codes into a trie.

A real trie data structure might accommodate millions of gene sequences, each thousands of letters long and still retrieve keys efficiently.



A larger gene code trie

Here we have removed the letters from the trie and made the nodes smaller, which accommodates an extra level of the trie, again inserting genetic codes like above, four letters long.


If you enjoyed this page, there are more algorithms and data structures to be found on the main page.

Sources