stacks-core Feat/marf compression

This PR implements work discovered in #6593. Specifically, it makes the following changes to the on-disk representation of the MARF:

If a TriePtr is not a back-block pointer, then the back_block field is not stored since it's always 0.
If a node's children TriePtr list is sparse, then instead of storing TriePtr::default() for empty pointers, the system now stores a bitmap of which pointers are non-empty, and then only stores the non-empty TriePtrs. It only does this if this actually saves space over just storing the entire list (storing the entire list is cheaper if the list is nearly full).
Instead of copying a trie node from one trie to the next as part of a copy-on-write, the system will only store a patch from the old node to the new node. It will create a list of up to 16 patches across 16 tries before storing a full copy of the node.

In my (very small scale) benchmarks, this saves over 50% of space.

This PR is still a draft and will likely remain so for some time. It needs a lot more unit tests, and it would benefit significantly from fuzz and property testing against the current implementation. In addition, it will need a lot of performance tuning, since the act of reading a list of patch nodes will slow down reads.

Nov 05 '25 05:11 jcnelson

Bitmap flag (aka 0xff): Could we use a bit of the Node ID to indicate whether a bitmap is present? This could save one byte per node and rely on the node format described by the Node ID itself.

The need for the sparse TriePtr bitmap flag is to ensure that this byte cannot be interpreted as a TrieNodeID, so I don't think so.

Compression at target: The current code correctly handles both compressed and uncompressed MARF scenarios. Once compression becomes the standard, could we plan on simplify the code to follow a single “linear” path, keeping only the compression-related logic?

I don't think so. We have no way to compel existing node operators to re-compress their chainstate, and they could continue to use uncompressed chainstate even across hard forks.

Node ID as a type: Currently, TriePtr id is represented as u8 (not introduced by this PR). Since we frequently interact with it, checking flags and node types, it might be useful to introduce a dedicated type. This could encapsulate convenient behaviors and avoid repeated conversions and manual flag interpretation. (In that case, this change could be addressed in a separate PR)

That's fine in principle, but that kind of refactoring would change potentially many lines. Let's first make sure the functional test coverage is sufficiently adequate that we feel comfortable shipping this, and then we can worry about the refactoring in a follow-up PR.

Nov 21 '25 19:11 jcnelson

Alright, all CI tests pass @federico-stacks

Dec 03 '25 02:12 jcnelson

Codecov Report

:x: Patch coverage is 73.45845% with 396 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 63.31%. Comparing base (c9003aa) to head (fdc3e60).

Files with missing lines	Patch %	Lines
stackslib/src/chainstate/stacks/index/node.rs	82.28%	107 Missing :warning:
stackslib/src/chainstate/stacks/index/storage.rs	69.23%	100 Missing :warning:
...tackslib/src/chainstate/stacks/index/test/cache.rs	55.95%	85 Missing :warning:
stackslib/src/chainstate/stacks/index/bits.rs	66.39%	81 Missing :warning:
stackslib/src/chainstate/stacks/index/test/marf.rs	88.40%	8 Missing :warning:
stackslib/src/chainstate/stacks/index/marf.rs	82.14%	5 Missing :warning:
stackslib/src/chainstate/stacks/index/mod.rs	0.00%	4 Missing :warning:
stacks-common/src/codec/mod.rs	62.50%	3 Missing :warning:
stackslib/src/chainstate/stacks/index/file.rs	60.00%	2 Missing :warning:
stackslib/src/chainstate/stacks/index/trie.rs	66.66%	1 Missing :warning:

:x: Your project check has failed because the head coverage (63.31%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

:exclamation: There is a different number of reports uploaded between BASE (c9003aa) and HEAD (fdc3e60). Click for more details.

HEAD has 10 uploads less than BASE

Flag BASE (c9003aa) HEAD (fdc3e60)

73 63

Flag	BASE (c9003aa)	HEAD (fdc3e60)
	73	63

Additional details and impacted files

@@             Coverage Diff              @@
##           develop    #6654       +/-   ##
============================================
- Coverage    77.90%   63.31%   -14.59%     
============================================
  Files          580      580               
  Lines       361187   362181      +994     
============================================
- Hits        281374   229307    -52067     
- Misses       79813   132874    +53061

Files with missing lines	Coverage Δ
stackslib/src/chainstate/stacks/index/cache.rs	`85.08% <ø> (-11.15%)`	:arrow_down:
stackslib/src/chainstate/stacks/index/test/file.rs	`57.89% <ø> (-38.95%)`	:arrow_down:
stackslib/src/chainstate/stacks/index/test/mod.rs	`86.27% <100.00%> (-7.76%)`	:arrow_down:
stackslib/src/chainstate/stacks/index/trie_sql.rs	`85.36% <100.00%> (ø)`
stackslib/src/chainstate/stacks/index/trie.rs	`84.59% <66.66%> (ø)`
stackslib/src/chainstate/stacks/index/file.rs	`87.45% <60.00%> (-2.58%)`	:arrow_down:
stacks-common/src/codec/mod.rs	`80.17% <62.50%> (-4.09%)`	:arrow_down:
stackslib/src/chainstate/stacks/index/mod.rs	`46.57% <0.00%> (-1.32%)`	:arrow_down:
stackslib/src/chainstate/stacks/index/marf.rs	`78.94% <82.14%> (-3.53%)`	:arrow_down:
stackslib/src/chainstate/stacks/index/test/marf.rs	`31.19% <88.40%> (-12.29%)`	:arrow_down:
... and 4 more

... and 370 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update c9003aa...fdc3e60. Read the comment docs.

:rocket: New features to boost your workflow:

:snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
:package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Dec 05 '25 12:12 codecov[bot]

@federico-stacks Please, do not force-push changes. It makes it very difficult to see what has changed in-between reviews. Instead, please merge the two branches and create a merge commit. Thanks!

Dec 15 '25 18:12 jcnelson

@federico-stacks Please, do not force-push changes. It makes it very difficult to see what has changed in-between reviews. Instead, please merge the two branches and create a merge commit. Thanks!

Clear. In this case, I only force-pushed the last commit (the Node patch fix) because I had missed a small related change. I did it that way to keep everything contained in a single commit and make it easier to reference in the related comment (and possibly simplify the review). No commit history was lost.

Dec 16 '25 08:12 federico-stacks