stacks-core icon indicating copy to clipboard operation
stacks-core copied to clipboard

[MARF] Investigate sparse trie representations

Open jcnelson opened this issue 2 months ago • 2 comments

@kantai and myself suspect that the bulk of redundant data in a MARF index comes from one of a handful of places:

  • When performing a copy-on-write to produce a new trie, most of the nodes on the trie path will not change very much between two subsequent tries.

  • When performing a copy-on-write, most trie backpointers will point to the same nodes they did when they were created

  • A substantial number of leaves will have the (same) empty hash, since their keys are unmapped

  • We don't use varint encodings for integers, even though most integers fit into fewer than four bytes

I need to figure out how much of a potential space savings we can get by addressing some or all of the above.

jcnelson avatar Oct 13 '25 18:10 jcnelson

Discoveries so far:

  • Not storing the back_block field in a TriePtr for non-backptr nodes saves 10-15% of space in tests. This value is always 0 for non-backptr nodes

  • Not storing the intermediate hashes of each node (which are only needed on Merkle proof construction) saves about 5% of space in tests. These can be regenerated on-the-fly.

jcnelson avatar Oct 16 '25 02:10 jcnelson

More discoveries:

  • Compressing sparse lists of TriePtr (whose contents are largely empty pointers) produces another 5-10% space savings. We instead store a bitmap of which TriePtr slots are non-empty, and only store the non-empty TriePtrs.

  • Storing only the difference in child pointers between a node in an ancestor trie and the copy of the node pulled forward into a new trie via copy-on-write (COW) produces a whopping 30-50% space savings. We limit the "patch depth" to 16 in this test; if a node would need to be patched 17 or more times, we instead store the COW node directly (in order to limit the read amplification of scanning through a sequence of patches across ancestor tries).

I haven't tried this on chainstate yet, but in microbencharks I'm seeing 50%-75% overall size reduction with my patch set.

jcnelson avatar Nov 01 '25 02:11 jcnelson