rippled feat: tweak sparse array settings

PR Overview

This pull request introduces optimizations to the SHAMapInnerNode construction process in the rippled codebase. The optimizations are centered around improving memory allocation strategies for the sparse array of child pointers and hashes within inner nodes of the SHAMap data structure.

Key Changes

Optimized Allocation in makeFullInner:
- The method now pre-calculates the number of non-empty branches by scanning the serialized data once. This allows for a single, precise memory allocation for the SHAMapInnerNode based on the actual number of non-empty branches, thus avoiding subsequent reallocations.
- The isBranch_ bitmask is directly set using the pre-calculated information, streamlining the process of initializing the node.
- Unnecessary default constructions of SHAMapHash objects for empty branches are avoided, reducing initialization overhead.
Streamlined makeCompressedInner:
- Initialization of SHAMapInnerNode is now performed with the exact count of non-empty branches derived from the serialized data size.
- The method includes additional validation to ensure serialized positions are in ascending order.
Updated Sparse Array Boundaries:
- Based on empirical data from ledger 83,592,201, the boundaries for the sparse array in TaggedPointer.ipp have been adjusted. The new boundaries are designed to align with observed distributions of children in inner nodes, aiming to minimize memory reallocations and improve overall memory usage.

Expected Impact

The changes are expected to reduce memory overhead by avoiding unnecessary reallocations and default constructions

Nov 04 '23 02:11 sublimator

I had (have?) a plan for loading binary dumps that involved creating a tree efficiently with no reallocations and while beating my head against the SM code, in the process noticed some opportunities to make the creation of inner nodes a little more efficient.

Nov 04 '23 03:11 sublimator

From ledger 83,592,201:

This is drawn using the prior boundary points of 2,4,6,16
There does seem to be a fair bit of overallocation in the middle of the tree (edit: histogram)

The TaggedPointer's stated reason to live:

    The motivation for this class is saving RAM. A large percentage of inner
    nodes only store a small number of children. Memory can be saved by
    storing the inner node's children in sparse arrays. Measurements show
    that on average a typical SHAMap's inner nodes can be stored using only
    25% of the original space.

The "Exact Allocation" referenced there is based on calculations involving spending 8 bytes extra "somewhere" to encode the exact number of children.

I'm still foggy on the lifecycle of nodes in rippled, and when they can be treated as immutable, but it seems like in principle for hashed nodes, it ought to be able to have the exact amount of children allocated "somehow".

Nov 04 '23 05:11 sublimator

Thanks @sublimator I will assign myself to review this.

Nov 08 '23 21:11 seelabs

@sublimator Just FYI: I think this repository has some the analysis I did to choose the boundaries: https://github.com/seelabs/shamap_compress/ In particular, I should review this python notebook: https://github.com/seelabs/shamap_compress/blob/master/shamap_compress.ipynb I haven't had time to review this PR yet, but I'll look forward to refreshing my memory on this stuff.

Nov 09 '23 00:11 seelabs

Thanks for taking a look Scott :)

I am trying to stay off the computer for a few days - but I looked (on phone, so technically ...) quickly at the child count histogram in that repo and saw pretty different numbers to what I got.

When I am back on, I will double-check the child counts I posted. I did note some deltas in some of the stats in historical vs latest ledgers I looked at recently.

Nov 09 '23 00:11 sublimator

At the risk of tricking myself back onto the computer -perhaps we could expand the scope here to some kind of stats command?

Nov 09 '23 02:11 sublimator

Much of the team is currently on vacation. We will certainly have a look at this in January. Thanks for the ping!

Internal tracker: RPFC-86

Dec 19 '23 19:12 intelliot

Ping

Dec 21 '23 11:12 sublimator

One request for the ping bot: remove old ping comments when adding a new one. @sublimator

Jan 05 '24 19:01 thejohnfreeman

Hahaha :)

Jan 05 '24 20:01 sublimator

Jan 06 '24 01:01 sublimator

Sorry for the delay. I'll try to get to this later this week.

Jan 08 '24 22:01 seelabs

Nice one! Thanks Scott

Jan 09 '24 00:01 sublimator

@sublimator as these changes are expected to impact performance, please provide these details for the performance team to understand and test:

Is this a new feature, bug fix, or improvement to existing functionality?
What behavior/functionality does the change impact?
In what processing can the impact be measured? Be as specific as possible - e.g. RPC client call, payment transaction that involves LOB, AMM, caching, DB operations, etc.
Does this change affect concurrent processing - e.g. does it involve acquiring locks, multi-threaded processing, or async processing?

Thanks!

Jan 11 '24 17:01 intelliot

@intelliot

Will wait on Scott's input first (@seelabs) as I'm much busier now on other things than when I first opened this PR.

That said, I can quickly say:

I expect the performance changes to be rather minor
This PR represents initial baby steps and a renewed focus on this part of the code after many years (the child counts seem to have changed considerably)
Includes the beginnings of a command to get shamap stats which could prove useful for subsequent changes that could improve performance

Jan 12 '24 08:01 sublimator

I'll plan on running this patch overnight and compare memory footprints a server with and without this patch.

Jan 16 '24 21:01 seelabs

Thanks!

Jan 17 '24 01:01 sublimator

mem_usage mem_diff mem_percent_change

This is what I got from last night's run, but I'll need to do it again. I realized that the two servers had different path finding settings, and that makes a difference. I'll re-run the test again tonight. (I thought I'd show this result anyway, but we'll have a better apples to apples comparison tomorrow).

Jan 17 '24 15:01 seelabs

I doubt it will be much at all - from memory - iirc u can save a bit more (few hundred mbs??) when allocating exacly per the [experiment] tagged branch (quick hack poc)

But I guess "take care of cents and the dollars take care of themselves"

Jan 17 '24 15:01 sublimator

@seelabs

Btw, somewhat related to path finding, I stumbled across this a while back: https://github.com/XRPLF/rippled/issues/4790

I think the order book is only used for path finding right?

Jan 18 '24 02:01 sublimator

Here are the charts from last night. I made sure both servers had the same configs (other than db paths since I ran them on the same machine: mem_usage mem_diff mem_percent_change

Given those charts it looks like this change uses a bit more memory than the current version - low hundreds of mb. Honestly, that's surprising, I did expect a memory savings. Let me double check the scripts I used to run this to make sure I didn't make a dumb error like swapping "base" and "new" (after a quick check the script looks right to me though).

Jan 18 '24 14:01 seelabs

@sublimator Thanks for the bug report on path finding. I'll comment there.

Jan 18 '24 14:01 seelabs

Given those charts it looks like this change uses a bit more memory than the current version

Interesting! I will have to have a look at it again. Been a while.

Jan 18 '24 14:01 sublimator

Do you have any theory as to why it might use more ram? Assuming it is and consistently so? I expected it to be marginal (even bordering on theoretical) at best, but I still thought there would be a measurable improvement. I spent more time eyeballing the [experiment] branch to be fair, but figured this would be less controversial.

Jan 18 '24 23:01 sublimator

How much time do you have to dig into this? What can we salvage here? Does reverting the boundaries "fix" the memory increase? If so, potentially worth just merging that until have time for next round of investigations? But I'm not sure ripple "works like that"

Jan 19 '24 01:01 sublimator

New is greater than old in the first chart, but the delta is negative in the subsequent charts, which suggests to me there is some mistaken switching going on. Is the first chart mislabeled, or are the other charts oddly reporting the change from new to old instead of the other way?

Jan 19 '24 15:01 thejohnfreeman

@thejohnfreeman The delta is negative because it's measuring "memory savings". So positive is a memory savings since I use these scripts to find out "how much memory savings do I get with some new code". Yeah, that's a little weird, but it is working as indended.

Jan 19 '24 18:01 seelabs

@sublimator Let me revert the boundary changes and make sure there's no memory difference and if they're aren't I'll take a close look at the code. I doing the correct allocation upfront is a good change (I think, I still need to look closely...) so yeah, I do think we want some of these changes even if we don't want the boundary changes yet.

Jan 19 '24 21:01 seelabs

I don't really have bandwidth atm to dive back into this much. While you're in the area it may be worth looking at the experiment branch (in so much as it seemed to reduce memory )

I think the shamap_info command could grow into something useful

Cheers

Jan 20 '24 01:01 sublimator

@seelabs @thejohnfreeman

Did you ever look at this more? What's the plan? What should be done with this?

Thanks!

Apr 26 '24 07:04 sublimator

rippled rippled copied to clipboard

feat: tweak sparse array settings

PR Overview

Key Changes

Expected Impact

rippled
rippled copied to clipboard