rippled
rippled copied to clipboard
feat: tweak sparse array settings
PR Overview
This pull request introduces optimizations to the SHAMapInnerNode construction process in the rippled codebase. The optimizations are centered around improving memory allocation strategies for the sparse array of child pointers and hashes within inner nodes of the SHAMap data structure.
Key Changes
-
Optimized Allocation in
makeFullInner:- The method now pre-calculates the number of non-empty branches by scanning the serialized data once. This allows for a single, precise memory allocation for the
SHAMapInnerNodebased on the actual number of non-empty branches, thus avoiding subsequent reallocations. - The
isBranch_bitmask is directly set using the pre-calculated information, streamlining the process of initializing the node. - Unnecessary default constructions of
SHAMapHashobjects for empty branches are avoided, reducing initialization overhead.
- The method now pre-calculates the number of non-empty branches by scanning the serialized data once. This allows for a single, precise memory allocation for the
-
Streamlined
makeCompressedInner:- Initialization of
SHAMapInnerNodeis now performed with the exact count of non-empty branches derived from the serialized data size. - The method includes additional validation to ensure serialized positions are in ascending order.
- Initialization of
-
Updated Sparse Array Boundaries:
- Based on empirical data from ledger 83,592,201, the boundaries for the sparse array in
TaggedPointer.ipphave been adjusted. The new boundaries are designed to align with observed distributions of children in inner nodes, aiming to minimize memory reallocations and improve overall memory usage.
- Based on empirical data from ledger 83,592,201, the boundaries for the sparse array in
Expected Impact
The changes are expected to reduce memory overhead by avoiding unnecessary reallocations and default constructions
I had (have?) a plan for loading binary dumps that involved creating a tree efficiently with no reallocations and while beating my head against the SM code, in the process noticed some opportunities to make the creation of inner nodes a little more efficient.
From ledger 83,592,201:
This is drawn using the prior boundary points of 2,4,6,16
There does seem to be a fair bit of overallocation in the middle of the tree (edit: histogram)
The TaggedPointer's stated reason to live:
The motivation for this class is saving RAM. A large percentage of inner
nodes only store a small number of children. Memory can be saved by
storing the inner node's children in sparse arrays. Measurements show
that on average a typical SHAMap's inner nodes can be stored using only
25% of the original space.
The "Exact Allocation" referenced there is based on calculations involving spending 8 bytes extra "somewhere" to encode the exact number of children.
I'm still foggy on the lifecycle of nodes in rippled, and when they can be treated as immutable, but it seems like in principle for hashed nodes, it ought to be able to have the exact amount of children allocated "somehow".
Thanks @sublimator I will assign myself to review this.
@sublimator Just FYI: I think this repository has some the analysis I did to choose the boundaries: https://github.com/seelabs/shamap_compress/ In particular, I should review this python notebook: https://github.com/seelabs/shamap_compress/blob/master/shamap_compress.ipynb I haven't had time to review this PR yet, but I'll look forward to refreshing my memory on this stuff.
Thanks for taking a look Scott :)
I am trying to stay off the computer for a few days - but I looked (on phone, so technically ...) quickly at the child count histogram in that repo and saw pretty different numbers to what I got.
When I am back on, I will double-check the child counts I posted. I did note some deltas in some of the stats in historical vs latest ledgers I looked at recently.
At the risk of tricking myself back onto the computer -perhaps we could expand the scope here to some kind of stats command?
Much of the team is currently on vacation. We will certainly have a look at this in January. Thanks for the ping!
Internal tracker: RPFC-86
Ping
One request for the ping bot: remove old ping comments when adding a new one. @sublimator
Hahaha :)
Sorry for the delay. I'll try to get to this later this week.
Nice one! Thanks Scott
@sublimator as these changes are expected to impact performance, please provide these details for the performance team to understand and test:
- Is this a new feature, bug fix, or improvement to existing functionality?
- What behavior/functionality does the change impact?
- In what processing can the impact be measured? Be as specific as possible - e.g. RPC client call, payment transaction that involves LOB, AMM, caching, DB operations, etc.
- Does this change affect concurrent processing - e.g. does it involve acquiring locks, multi-threaded processing, or async processing?
Thanks!
@intelliot
Will wait on Scott's input first (@seelabs) as I'm much busier now on other things than when I first opened this PR.
That said, I can quickly say:
- I expect the performance changes to be rather minor
- This PR represents initial baby steps and a renewed focus on this part of the code after many years (the child counts seem to have changed considerably)
- Includes the beginnings of a command to get shamap stats which could prove useful for subsequent changes that could improve performance
I'll plan on running this patch overnight and compare memory footprints a server with and without this patch.
Thanks!
This is what I got from last night's run, but I'll need to do it again. I realized that the two servers had different path finding settings, and that makes a difference. I'll re-run the test again tonight. (I thought I'd show this result anyway, but we'll have a better apples to apples comparison tomorrow).
I doubt it will be much at all - from memory - iirc u can save a bit more (few hundred mbs??) when allocating exacly per the [experiment] tagged branch (quick hack poc)
But I guess "take care of cents and the dollars take care of themselves"
@seelabs
Btw, somewhat related to path finding, I stumbled across this a while back: https://github.com/XRPLF/rippled/issues/4790
I think the order book is only used for path finding right?
Here are the charts from last night. I made sure both servers had the same configs (other than db paths since I ran them on the same machine:
Given those charts it looks like this change uses a bit more memory than the current version - low hundreds of mb. Honestly, that's surprising, I did expect a memory savings. Let me double check the scripts I used to run this to make sure I didn't make a dumb error like swapping "base" and "new" (after a quick check the script looks right to me though).
@sublimator Thanks for the bug report on path finding. I'll comment there.
Given those charts it looks like this change uses a bit more memory than the current version
Interesting! I will have to have a look at it again. Been a while.
Do you have any theory as to why it might use more ram? Assuming it is and consistently so? I expected it to be marginal (even bordering on theoretical) at best, but I still thought there would be a measurable improvement. I spent more time eyeballing the [experiment] branch to be fair, but figured this would be less controversial.
How much time do you have to dig into this? What can we salvage here? Does reverting the boundaries "fix" the memory increase? If so, potentially worth just merging that until have time for next round of investigations? But I'm not sure ripple "works like that"
New is greater than old in the first chart, but the delta is negative in the subsequent charts, which suggests to me there is some mistaken switching going on. Is the first chart mislabeled, or are the other charts oddly reporting the change from new to old instead of the other way?
@thejohnfreeman The delta is negative because it's measuring "memory savings". So positive is a memory savings since I use these scripts to find out "how much memory savings do I get with some new code". Yeah, that's a little weird, but it is working as indended.
@sublimator Let me revert the boundary changes and make sure there's no memory difference and if they're aren't I'll take a close look at the code. I doing the correct allocation upfront is a good change (I think, I still need to look closely...) so yeah, I do think we want some of these changes even if we don't want the boundary changes yet.
I don't really have bandwidth atm to dive back into this much. While you're in the area it may be worth looking at the experiment branch (in so much as it seemed to reduce memory )
I think the shamap_info command could grow into something useful
Cheers
@seelabs @thejohnfreeman
Did you ever look at this more? What's the plan? What should be done with this?
Thanks!