shecc
shecc copied to clipboard
Optimize memory usage in trie by using radix tree
The current Trie implementation allocates 128 entries for child nodes at each node, but in practice, C function names only consist of 63 distinct characters ('_', '0'-'9', 'a'-'z', 'A'-'Z'). This leads to inefficient memory usage, especially in large Trie structures where the underutilized space becomes more pronounced. To optimize memory usage, we can consider replacing the current Trie structure with a Radix Tree. A Radix Tree is a space-optimized version of a Trie that compresses nodes with a single child, thereby reducing the number of nodes and significantly lowering memory consumption. Additionally, this optimization could not only save memory but also potentially improve the performance of operations such as insertions and lookups by decreasing the depth of the tree.
How about trie-hard?
This issue is on my backlog, and I’ll find time to implement the optimization.
I took a brief look at trie-hard, and it seems to be used when a large number of misses are expected. However, it makes me wonder if our workload falls into this category as well?
commit c50f8c15b0e4b57d6409fc12d413d88e0266fa7b replaced trie with general string-based-key hashmap.