A trie is a tree-like data structure in which every node stores a character. A trie can store one or multiple words. If a trie stores all suffixes of a word, it is called a suffix trie. After you build a suffix trie, substrings can be retrieved easily. This post is about suffix trie implementation, search, and retrieval of a substring.
All substrings in a word are all contiguous sequences of characters within a string with all possible lengths. For example, a word “ana”, its substrings include “a”, “an”, “ana”, “n”, “na”, total 5 of them. Note, that all substrings are different from all suffixes of a word. In the example of “ana”, its suffixes are “ana”, “na”, “a”, total of 3 of them. A suffix trie holds all substrings, including suffixes, of a word.
This implementation uses a hash map (a dictionary in Python) to store children, which uses a character as a key to find the node holding the next character in this branch. The alternative solution uses an array to store children. The implementation using a hash map is more space-efficient than the array implementation.
Table of Content
- Map of suffix trie implementations
- Define a SuffixTrieNode classe
- Build a suffix trie
- Search a substring
- Print all nodes in trie
- Free download
Map of suffix trie implementations
Part 1 – Suffix trie implementation using array
Part 2 – Suffix trie implementation using hash map
Define a SuffixTrieNode class
A SuffixTrieNode is a class to create nodes in a suffix trie. It has 2 variables. children is a hash map data structure, in which the key is a character, and the value is the node that stores the next character in a suffix. indices is a list data structure that stores the start position of a substring in a word.
Java
1 2 3 4 | class SuffixTrieNode { HashMap<Character, SuffixTrieNode> children = new HashMap<Character, SuffixTrieNode>(); ArrayList<Integer> indexes = new ArrayList<Integer>(); } |
Javascript
1 2 3 4 5 6 | class SuffixTrieNode { constructor() { this.children = new Map(); this.indexes = []; } } |
Python
1 2 3 4 5 | class SuffixTrieNode: def __init__(self) : #self.data = None self.children = {} # map self.indexes = [] |
Build a suffix trie
A SuffixTrie class has one variable root. All operations, such as insertion, search, or traversal, start from the root. In a suffix trie, only the root has multiple children. Other nodes except leaves have one child, which is the node of the next character in the suffix. You can initialize the root in the constructor.
To build a suffix trie from a word, you use the word as the input of the constructor. You subtract the suffix and call the insertSuffix method with the root, a suffix, and the index which is the start position of the suffix in the word as input.
insertSuffix method is a recursive function. The first thing is to save the index in the input node’s indices. It records the position of the substring inside the input node. It will be used in search. If the input string is null or its length is 0, it is a terminal condition. All recursive calls return to calling functions.
Otherwise, you retrieve the first character of the input string. If the character is one of the keys in the input node’s children, you get the next node by using the character as the key. If the character doesn’t exist in the key set, a child node is created. You save the new entry with the character as the key, and the new node as the value in children. Then the method calls itself with the new node, the substring after the first character, and the index.
Java
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | public class SuffixTrieMap { private SuffixTrieNode root = new SuffixTrieNode(); //Constructor, Call insert, Time O(s^2), Space O(s), s is string length public SuffixTrieMap(String s) { for (int i = 0; i < s.length(); i++) { String suffix = s.substring(i); System.out.println(suffix); insertSuffix(root, suffix, i); } } //Insert a suffix node, Recursion, Time O(s), Space O(s) public void insertSuffix(SuffixTrieNode node, String s, int index) { node.indexes.add(index); if (s != null && s.length() > 0) { char c = s.charAt(0); SuffixTrieNode child = null; if (node.children.containsKey(c)) { child = node.children.get(c); } else { child = new SuffixTrieNode(); node.children.put(c, child); } insertSuffix(child, s.substring(1), index); } } |
Javascript
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | class SuffixTrieMap { //Constructor, Call insert, Time O(s^2), Space O(s), s is string length constructor (s) { this.root = new SuffixTrieNode(); for (let i = 0; i < s.length; i++) { let suffix = s.substring(i); this.insertSuffix(this.root, suffix, i); } } //Insert a suffix node, Recursion, Time O(s), Space O(s) insertSuffix(node, s, index) { node.indexes.push(index); if (s != null && s.length > 0) { var c = s.charAt(0); var child = null; if (node.children.has(c)) { child = node.children.get(c); } else { child = new SuffixTrieNode(); node.children.set(c, child); } this.insertSuffix(child, s.substring(1), index); } } |
Python
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | class SuffixTrieMap: #Constructor, Call insert, Time O(s^2), Space O(s), s is string length def __init__(self, s) : self.root = SuffixTrieNode() for i in range(0, len(s)) : suffix = s[i:] self.insertSuffix(self.root, suffix, i) #Insert a suffix node, Recursion, Time O(s), Space O(s) def insertSuffix(self, node, s, index) : node.indexes.append(index) if s and len(s) > 0 : c = s[0] child = None if c in node.children.keys() : child = node.children.get(c) else: child = SuffixTrieNode() node.children[c] = child self.insertSuffix(child, s[1:], index) |
Search a substring
Search is to find a substring in the word. The search method is a wrapper method of searchHelper. The start node is the root. searchHelper is a recursive method. When the input string is null or the length is 0, it is a terminal condition. You return with the result which is stored in the node’s indices. Otherwise, you take the first character of the input string. If the character is one of the keys in the input node’s children, get the next node by using the character as the key. Then the method calls itself to search for the next character. If the first character doesn’t exist in the children‘s key set, return null.
Java
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | //Search string, Recursion, Time O(s), Space O(s) public List<Integer> search(String str) { return searchHelper(root, str); } //recursion, Time O(s), Space O(s) private ArrayList<Integer> searchHelper(SuffixTrieNode node, String s) { if (s == null || s.length() == 0) { return node.indexes; } else { char first = s.charAt(0); if (node.children.containsKey(first)) { return searchHelper(node.children.get(first), s.substring(1)); } } return null; } |
Javascript
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | //Search string, Recursion, Time O(s), Space O(s) search(str) { return this.searchHelper(this.root, str); } //recursion, Time O(s), Space O(s) searchHelper(node, s) { if (s == null || s.length == 0) { return node.indexes; } else { var first = s.charAt(0); if (node.children.has(first)) { return this.searchHelper(node.children.get(first), s.substring(1)); } } return null; } |
Python
1 2 3 4 5 6 7 8 9 10 11 12 13 | #Search string, Recursion, Time O(s), Space O(s) def search(self, str): return self.searchHelper(self.root, str) #recursion, Time O(s), Space O(s) def searchHelper(self, node, s) : if s is None or len(s) == 0 : return node.indexes else: first = s[0] if first in node.children.keys() : return self.searchHelper(node.children.get(first), s[1:]) return None |
Print all nodes in trie
Now you can print all nodes in a trie, you use recursion again. When the input node is null, it is a terminal condition, you can return. Otherwise, you print the input string, which is the substring from the root to the current node. Then you check the entries in the input node’s children. If an entry is not null, the method calls itself with the entry’s key and value to print the next substring in this branch.
Java
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | //Print all suffixes, Time O(s^2), Space O(s) public void printTrie() { printHelper("", root); } //Print helper, Recursion, Time O(s^2), Space O(s) private void printHelper(String s, SuffixTrieNode node) { if (node == null) return; System.out.println(s); for (Map.Entry<Character, SuffixTrieNode> m : node.children.entrySet()) { printHelper(s + m.getKey(), m.getValue()); } } |
Javascript
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | //Print all suffixes, Time O(s^2), Space O(s) print() { this.printHelper("", this.root); } //Print helper, Recursion, Time O(s^2), Space O(s) printHelper(s, node) { if (node == null) return; console.log(s); for (let [key, value] of node.children) { this.printHelper(s + key, value); } } |
Python
1 2 3 4 5 6 7 8 9 10 11 | #Print all suffixes, Time O(s^2), Space O(s) def printTrie(self) : self.printHelper("", self.root) #Print helper, Recursion, Time O(s^2), Space O(s) def printHelper(self, s, node): if node is None: return print(s) for key, value in node.children.items(): self.printHelper(s + key, value) |
Free download
Download suffix trie implementation using map in Java, JavaScript and Python
Trie implmentation using map
Data structures introduction PDF