rippled
rippled copied to clipboard
Introduce a slab-based memory allocator and optimize SHAMapItem
Introduce support for a slabbed allocator:
When instantiating a large amount of fixed-sized objects on the heap the overhead that dynamic memory allocation APIs impose will quickly become significant.
In some cases, allocating a large amount of memory at once and using a slabbing allocator to carve the large block into fixed-sized units that are used to service requests for memory out will help to reduce memory fragmentation significantly and, potentially, improve overall performance.
This commit introduces a new SlabAllocator<> class that wraps around the complexity and exposes an API that is similar to the C++ concept of an Allocator.
This class is not a general-purpose allocator and should not be used unless profiling and analysis of specific memory allocation patterns indicates that the additional complexity introduced will improve the performance of the system overall and subsequenting profiling proves that.
Optimize SHAMapItem and leverage new slab allocator:
The SHAMapItem class contains a variable-sized buffer that holds the serialized data associated with a particular item inside a SHAMap.
Prior to this commit, the buffer for the serialized data was allocated separately. Coupled with the fact that most instances of SHAMapItem were wrapped around a std::shared_ptr meant that an instantiation might result in up to three separate memory allocations.
This commit switches away from std::shared_ptr for SHAMapItem and uses boost::intrusive_ptr instead, allowing the reference count for an instance to live inside the instance itself. Coupled
with using a slab-based allocator to optimize memory allocation for the most commonly sized buffers, the net result is significant memory savings. In testing, the reduction in memory usage hovers between 400MB and 650MB. Other scenarios might result in larger savings.
High Level Overview of Change
Context of Change
Type of Change
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] Refactor (non-breaking change that only restructures code)
- [ ] Tests (You added tests for code that already exists, or your new feature included in this PR)
- [ ] Documentation Updates
- [ ] Release
I thought I saw a fairly recent push to this pull request. So I tried building it. The previously observed build failures on macOS are still present.
@scottschurr the following is Nik's branch with my my additional changes which should build https://github.com/greg7mdp/rippled/tree/slab_alloc
@greg7mdp I think you should open a new PR with all of the changes, and @nbougalis should close this one. What do you think, @HowardHinnant & @scottschurr ?
I have a slight preference for keeping this one open so I can remember what I said 19 days ago. But I won't object if everyone wants to move to a new PR.
@mtrippled I also think it would make sense to create a new PR. I suggested it a few days back, but @nbougalis said he would take care of it, probably he'd like to review the branch first. @scottschurr I just verified that this branch builds fine on mac (with some unrelated warnings).
@greg7mdp, I'm fine either way, either updating this pull request or creating a new one.
And, @greg7mdp, I tried building your branch locally. Yup, your version works way better on macOS than the previous version did. Thanks! I think I will wait until your version is in an official pull request, however, before I start a full fledged review of the revised code.
Thanks @scottschurr for trying the mac version. I did build my branch on all 3 platforms (linux, windows, mac). @nbougalis , we should either update that PR or start a new one, so the review can proceed.
I've addressed the concerns and issues highlighted by reviewers. I have also incorporated some of the ideas by @greg7mdp into SlabAllocator, and refined the memory allocation engine.
I believe that this is ready for another review and, provided it's signed-off, a good candidate for merging.
I expect a reduction in overall memory usage of at least 15% on mainnet servers, but it depends on a number of factors. On my test bed, the reduction is closer to 22%. Performance should also improve.
Thanks.
rippled/src/test/shamap/SHAMapSync_test.cpp:101:50: warning: moving a temporary object prevents copy elision [-Wpessimizing-move]
SHAMapNodeType::tnACCOUNT_STATE, std::move(makeRandomAS()));
^
/Users/howardhinnant/Development/rippled/src/test/shamap/SHAMapSync_test.cpp:101:50: note: remove std::move call here
SHAMapNodeType::tnACCOUNT_STATE, std::move(makeRandomAS()));
^~~~~~~~~~ ~
1 warning generated.
@nbougalis please review Howard's suggestions above, at your convenience.
@intelliot: done - I'll wait for @HowardHinnant to sign off and for CI to run, then I'll fold the [FOLD] commits, sign and mark this as Passed.
@scottschurr can you confirm whether your concerns have been resolved?
@intelliot, no, I can't. I have another higher priority code review I'm working on. I can't provide an ETA for when I'll get to this one.
I checked with @scottschurr and we agreed that he has no particular need to review this PR, and we think it would be best to assign a different reviewer in order to move this forward. scottschurr is currently focused on other reviews that have higher priority.
I'll find another reviewer shortly. If that other reviewer approves, then this PR can be merged.
Both @HowardHinnant and I have approved this already. How many reviewers does this need?
@seelabs confirmed this is ready to merge.
When merging:
- merge conflicts should be resolved
- the clang-formatting should be fixed
@drlongle just FYI since this was discussed with Mark today. This PR isn't blocking on your review, but please feel free to have a look and share any questions/comments.
@nbougalis could you resolve the merge conflict and clang-format?
This has a unit test failure that’s unrelated to these changes and likely traces back to the recent “simplified fee” proposal by @ximinez.
I merged develop and applied clang-format.patch. If acceptable to you, @nbougalis , this PR will be squashed when it is merged, so it's OK to leave the merge commit for the moment. Here is the commit message I suggest; @nbougalis please confirm that this looks good (or suggest an alternative).
Optimize SHAMapItem with slabbed allocator: (#4218)
* Introduce support for a slab-based memory allocator:
When instantiating a large amount of fixed-sized objects on the heap the
overhead that dynamic memory allocation APIs impose will quickly become
significant.
In some cases, allocating a large amount of memory at once and using a
slabbing allocator to carve the large block into fixed-sized units that
are used to service requests for memory out will help to reduce memory
fragmentation significantly and, potentially, improve overall
performance.
The new `SlabAllocator<>` class exposes an API that is _similar_ to the
C++ concept of an `Allocator` but it is not meant to be a
general-purpose allocator.
It should not be used unless profiling and analysis of specific memory
allocation patterns indicates that the additional complexity introduced
will improve the performance of the system overall, and subsequent
profiling proves it.
A helper class, `SlabAllocatorSet<>` simplifies handling of variably
sized objects that benefit from slab allocations.
This commit incorporates improvements suggested by Greg Popovitch
(@greg7mdp).
* Optimize `SHAMapItem` and leverage new slab allocator:
The `SHAMapItem` class contains a variable-sized buffer that holds the
serialized data associated with a particular item inside a `SHAMap`.
Previously, the buffer for the serialized data was allocated separately.
The fact that most instances of `SHAMapItem` were wrapped around a
`std::shared_ptr` meant that an instantiation might result in up to
three separate memory allocations.
This commit switches away from `std::shared_ptr` for `SHAMapItem` and
uses `boost::intrusive_ptr` instead, allowing the reference count for an
instance to live inside the instance itself. Coupled with using a
slab-based allocator to optimize memory allocation for the most commonly
sized buffers, the net result is significant memory savings. In testing,
the reduction in memory usage hovers between 400MB and 650MB. Other
scenarios might result in larger savings.
In performance testing with NFTs, this commit reduces memory size by
about 15% sustained over long duration.
* Avoid using std::shared_ptr when not necessary:
The `Ledger` class contains two `SHAMap` instances: the state and
transaction maps. Previously, the maps were dynamically allocated using
`std::make_shared` despite the fact that they did not require lifetime
management separate from the lifetime of the `Ledger` instance to which
they belong.
The two `SHAMap` instances are now regular member variables. Some smart
pointers and dynamic memory allocation was avoided by using stack-based
alternatives.
I think that the three commits are functionally separate, and each is self-contained. IMO keeping them as-is is cleaner, more professional and makes it easier to review and, if needed, run git bisect in the future.
We had a comparable situation with https://github.com/XRPLF/rippled/pull/4192: the PR author recommended keeping the commits as-is, and so we did. Later, we learned there was some discontent with that, and that we should have squashed all the commits together.
I don't personally have any opinion either way, but if we're going to recommend merging this as three separate commits, then we should at least confirm that each commit stands on its own and passes unit tests & clang-format. It may make sense to open 2-3 new PRs for this purpose.
What we decided in #4192 is that if reviewers feel that a commit should be merged, and the commit message does not begin with [FOLD], then the reviewer should recommend a change to the commit message. And vice-versa.
blocked: let's get some agreement on whether this PR should end up as 3 commits, or 1.
If it's 3 commits, I think it would be best to open a couple of separate PRs the individual parts, so that we can confirm they are mergable on their own (they make sense and pass unit tests & clang-format).
@seelabs confirmed 3 commits for this is fine. I am rebasing the commits individually to ensure they pass CI on GitHub. If they pass, I will merge them no earlier than 2023-04-11 at 8:00 AM PT. If you have any objections, please comment here on this PR.
Suggested commit message:
Avoid using std::shared_ptr when not necessary: (#4218)
The `Ledger` class contains two `SHAMap` instances: the state and
transaction maps. Previously, the maps were dynamically allocated using
`std::make_shared` despite the fact that they did not require lifetime
management separate from the lifetime of the `Ledger` instance to which
they belong.
The two `SHAMap` instances are now regular member variables. Some smart
pointers and dynamic memory allocation was avoided by using stack-based
alternatives.
Commit 3 of 3 in #4218.
will merge after github checks pass, if I don't hear any objections
@manojsdoshi to try to corroborate the 15-22% memory use reduction
Released in 1.11.0.