Trie-字典树-前缀树原理
Summary in a Sentence
A Trie is an extension of the multi-way tree structure.
Tries offer many advantages when handling string-related operations, such as saving memory space for common string prefixes, facilitating prefix operations, and supporting wildcard matching.
This article is only an introduction to the principles of Tries (also known as dictionary trees or prefix trees). The hands-on implementation of TrieMap/TrieSet is placed in the data structure design section after the binary tree series exercises. The reason is the same as in the previous TreeMap/TreeSet principles. I do not intend to discuss the specific implementation of such complex structures in the basics section, as beginners do not need to understand the code implementation of Tries at this stage.
However, I still explain the principles of Tries here for two purposes:
To give you an intuitive feel for the various transformations of binary tree structures, helping you understand why my tutorial emphasizes binary tree structures.
To introduce you to this data structure early, making you aware of its API and applicable scenarios. In the future, when you encounter related problems, you might think of using a Trie to solve them, at least giving you an idea. You can always come back and copy the code template. The implementation of such data structures is fixed, and you won't be asked to manually write a Trie from scratch in exams or interviews. Just copy and use it directly.
Alright, let's get started.
For the explanation of Tries, I will guide you through implementing a TrieMap
and a TrieSet
. First, let's review the Map/Set types we have already implemented:
The standard hash table
HashMap
, which uses a hash function to store key-value pairs in thetable
array. There are two methods to resolve hash collisions. Its characteristic is speed, with basic operations such as addition, deletion, lookup, and modification having a time complexity ofO(1)
. The hash setHashSet
is a simple wrapper aroundHashMap
.The linked hash map
LinkedHashMap
, which is an enhancement of the standard hash table using a doubly linked list structure. It inherits the complexity of hash table operations and can maintain the insertion order of all keys in the hash table.LinkedHashSet
is a simple wrapper aroundLinkedHashMap
.The array hash map
ArrayHashMap
, which is an enhancement of the standard hash table using an array structure. It inherits the complexity of hash table operations and provides an additionalrandomKey
function, which can return a random key inO(1)
time.ArrayHashSet
is a simple wrapper aroundArrayHashMap
.The
TreeMap
, which is based on a binary search tree. Its basic operations such as addition, deletion, lookup, and modification have a time complexity ofO(logN)
. Its characteristic is the ability to dynamically maintain the order of key-value pairs and provide many additional APIs to manipulate key-value pairs.TreeSet
is a simple wrapper aroundTreeMap
.
TrieSet
is also a simple wrapper around TrieMap
, so let's focus on the implementation principles of TrieMap
below.
Main Application Scenarios of Trie Trees
Trie trees are a data structure specially optimized for strings, which is why they are also called prefix trees. Trie trees have several advantages in handling strings, which are listed below.
Saving Storage Space
Let's compare with HashMap
. For example, storing several key-value pairs like this:
Map<String, Integer> map = new HashMap<>();
map.put("apple", 1);
map.put("app", 2);
map.put("appl", 3);
Think back to the implementation principle of hash tables: key-value pairs are stored in the table
array. This means it actually creates the strings "apple"
, "app"
, and "appl"
, occupying 12 characters of memory space.
However, note that these three strings share a common prefix. The prefix "app"
is stored three times, and "l"
is stored twice.
If we use a TrieMap for storage:
// The key type of the Trie tree is fixed as String type, and the value type can be generic
TrieMap<Integer> map = new TrieMap<>();
map.put("apple", 1);
map.put("app", 2);
map.put("appl", 3);
A Trie tree does not store common prefixes multiple times, so you only need the memory space for the 5 characters in "apple"
to store the key.
In this example, the data size is small, and you might think that storing duplicates a few times is no big deal. However, if there are many keys, each very long, and with many common prefixes (which is often the case in real-world scenarios, such as with ID numbers), a Trie tree can save a lot of memory space.
Convenient for Handling Prefix Operations
Here's an example to make it clear:
// The key type of Trie tree is fixed as String, and the value type can be generic
TrieMap<Integer> map = new TrieMap<>();
map.put("that", 1);
map.put("the", 2);
map.put("them", 3);
map.put("apple", 4);
// "the" is the shortest prefix of "themxyz"
System.out.println(map.shortestPrefixOf("themxyz")); // "the"
// "them" is the longest prefix of "themxyz"
System.out.println(map.longestPrefixOf("themxyz")); // "them"
// "tha" is a prefix of "that"
System.out.println(map.hasKeyWithPrefix("tha")); // true
// There is no key with the prefix "thz"
System.out.println(map.hasKeyWithPrefix("thz")); // false
// "that", "the", and "them" are all prefixes of "th"
System.out.println(map.keysWithPrefix("th")); // ["that", "the", "them"]
Except for the keysWithPrefix
method, whose complexity depends on the length of the returned result, the complexity of other prefix operations is O(L)
, where L
is the length of the prefix string.
Think about the above operations. Can you achieve them with a HashMap or TreeMap? You would likely have to forcibly traverse all keys and compare each string prefix one by one, resulting in very high complexity.
By the way, isn't this keysWithPrefix
method perfect for implementing an autocomplete feature?