shecc icon indicating copy to clipboard operation
shecc copied to clipboard

Optimize memory usage in trie by using radix tree

Open visitorckw opened this issue 1 year ago • 2 comments
trafficstars

The current Trie implementation allocates 128 entries for child nodes at each node, but in practice, C function names only consist of 63 distinct characters ('_', '0'-'9', 'a'-'z', 'A'-'Z'). This leads to inefficient memory usage, especially in large Trie structures where the underutilized space becomes more pronounced. To optimize memory usage, we can consider replacing the current Trie structure with a Radix Tree. A Radix Tree is a space-optimized version of a Trie that compresses nodes with a single child, thereby reducing the number of nodes and significantly lowering memory consumption. Additionally, this optimization could not only save memory but also potentially improve the performance of operations such as insertions and lookups by decreasing the depth of the tree.

visitorckw avatar Aug 13 '24 13:08 visitorckw

How about trie-hard?

jserv avatar Oct 02 '24 15:10 jserv

This issue is on my backlog, and I’ll find time to implement the optimization.

I took a brief look at trie-hard, and it seems to be used when a large number of misses are expected. However, it makes me wonder if our workload falls into this category as well?

visitorckw avatar Oct 03 '24 15:10 visitorckw

commit c50f8c15b0e4b57d6409fc12d413d88e0266fa7b replaced trie with general string-based-key hashmap.

jserv avatar May 10 '25 16:05 jserv