rippled icon indicating copy to clipboard operation
rippled copied to clipboard

Introduce a slab-based memory allocator and optimize SHAMapItem

Open nbougalis opened this issue 3 years ago • 8 comments

Introduce support for a slabbed allocator:

When instantiating a large amount of fixed-sized objects on the heap the overhead that dynamic memory allocation APIs impose will quickly become significant.

In some cases, allocating a large amount of memory at once and using a slabbing allocator to carve the large block into fixed-sized units that are used to service requests for memory out will help to reduce memory fragmentation significantly and, potentially, improve overall performance.

This commit introduces a new SlabAllocator<> class that wraps around the complexity and exposes an API that is similar to the C++ concept of an Allocator.

This class is not a general-purpose allocator and should not be used unless profiling and analysis of specific memory allocation patterns indicates that the additional complexity introduced will improve the performance of the system overall and subsequenting profiling proves that.

Optimize SHAMapItem and leverage new slab allocator:

The SHAMapItem class contains a variable-sized buffer that holds the serialized data associated with a particular item inside a SHAMap.

Prior to this commit, the buffer for the serialized data was allocated separately. Coupled with the fact that most instances of SHAMapItem were wrapped around a std::shared_ptr meant that an instantiation might result in up to three separate memory allocations.

This commit switches away from std::shared_ptr for SHAMapItem and uses boost::intrusive_ptr instead, allowing the reference count for an instance to live inside the instance itself. Coupled with using a slab-based allocator to optimize memory allocation for the most commonly sized buffers, the net result is significant memory savings. In testing, the reduction in memory usage hovers between 400MB and 650MB. Other scenarios might result in larger savings.

High Level Overview of Change

Context of Change

Type of Change

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [ ] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [ ] Refactor (non-breaking change that only restructures code)
  • [ ] Tests (You added tests for code that already exists, or your new feature included in this PR)
  • [ ] Documentation Updates
  • [ ] Release

nbougalis avatar Jun 29 '22 02:06 nbougalis

I thought I saw a fairly recent push to this pull request. So I tried building it. The previously observed build failures on macOS are still present.

scottschurr avatar Aug 09 '22 17:08 scottschurr

@scottschurr the following is Nik's branch with my my additional changes which should build https://github.com/greg7mdp/rippled/tree/slab_alloc

greg7mdp avatar Aug 09 '22 19:08 greg7mdp

@greg7mdp I think you should open a new PR with all of the changes, and @nbougalis should close this one. What do you think, @HowardHinnant & @scottschurr ?

mtrippled avatar Aug 09 '22 20:08 mtrippled

I have a slight preference for keeping this one open so I can remember what I said 19 days ago. But I won't object if everyone wants to move to a new PR.

HowardHinnant avatar Aug 09 '22 20:08 HowardHinnant

@mtrippled I also think it would make sense to create a new PR. I suggested it a few days back, but @nbougalis said he would take care of it, probably he'd like to review the branch first. @scottschurr I just verified that this branch builds fine on mac (with some unrelated warnings).

greg7mdp avatar Aug 09 '22 21:08 greg7mdp

@greg7mdp, I'm fine either way, either updating this pull request or creating a new one.

scottschurr avatar Aug 11 '22 00:08 scottschurr

And, @greg7mdp, I tried building your branch locally. Yup, your version works way better on macOS than the previous version did. Thanks! I think I will wait until your version is in an official pull request, however, before I start a full fledged review of the revised code.

scottschurr avatar Aug 11 '22 01:08 scottschurr

Thanks @scottschurr for trying the mac version. I did build my branch on all 3 platforms (linux, windows, mac). @nbougalis , we should either update that PR or start a new one, so the review can proceed.

greg7mdp avatar Aug 11 '22 02:08 greg7mdp

I've addressed the concerns and issues highlighted by reviewers. I have also incorporated some of the ideas by @greg7mdp into SlabAllocator, and refined the memory allocation engine.

I believe that this is ready for another review and, provided it's signed-off, a good candidate for merging.

I expect a reduction in overall memory usage of at least 15% on mainnet servers, but it depends on a number of factors. On my test bed, the reduction is closer to 22%. Performance should also improve.

Thanks.

nbougalis avatar Dec 19 '22 04:12 nbougalis

rippled/src/test/shamap/SHAMapSync_test.cpp:101:50: warning: moving a temporary object prevents copy elision [-Wpessimizing-move]
                SHAMapNodeType::tnACCOUNT_STATE, std::move(makeRandomAS()));
                                                 ^
/Users/howardhinnant/Development/rippled/src/test/shamap/SHAMapSync_test.cpp:101:50: note: remove std::move call here
                SHAMapNodeType::tnACCOUNT_STATE, std::move(makeRandomAS()));
                                                 ^~~~~~~~~~              ~
1 warning generated.

HowardHinnant avatar Jan 06 '23 16:01 HowardHinnant

@nbougalis please review Howard's suggestions above, at your convenience.

intelliot avatar Jan 12 '23 05:01 intelliot

@intelliot: done - I'll wait for @HowardHinnant to sign off and for CI to run, then I'll fold the [FOLD] commits, sign and mark this as Passed.

nbougalis avatar Jan 12 '23 21:01 nbougalis

@scottschurr can you confirm whether your concerns have been resolved?

intelliot avatar Jan 19 '23 00:01 intelliot

@intelliot, no, I can't. I have another higher priority code review I'm working on. I can't provide an ETA for when I'll get to this one.

scottschurr avatar Jan 20 '23 23:01 scottschurr

I checked with @scottschurr and we agreed that he has no particular need to review this PR, and we think it would be best to assign a different reviewer in order to move this forward. scottschurr is currently focused on other reviews that have higher priority.

I'll find another reviewer shortly. If that other reviewer approves, then this PR can be merged.

intelliot avatar Mar 20 '23 22:03 intelliot

Both @HowardHinnant and I have approved this already. How many reviewers does this need?

mtrippled avatar Mar 20 '23 22:03 mtrippled

@seelabs confirmed this is ready to merge.

When merging:

  • merge conflicts should be resolved
  • the clang-formatting should be fixed

intelliot avatar Mar 22 '23 17:03 intelliot

@drlongle just FYI since this was discussed with Mark today. This PR isn't blocking on your review, but please feel free to have a look and share any questions/comments.

intelliot avatar Mar 22 '23 17:03 intelliot

@nbougalis could you resolve the merge conflict and clang-format?

intelliot avatar Mar 22 '23 17:03 intelliot

This has a unit test failure that’s unrelated to these changes and likely traces back to the recent “simplified fee” proposal by @ximinez.

nbougalis avatar Mar 31 '23 18:03 nbougalis

I merged develop and applied clang-format.patch. If acceptable to you, @nbougalis , this PR will be squashed when it is merged, so it's OK to leave the merge commit for the moment. Here is the commit message I suggest; @nbougalis please confirm that this looks good (or suggest an alternative).

Optimize SHAMapItem with slabbed allocator: (#4218)

* Introduce support for a slab-based memory allocator:

When instantiating a large amount of fixed-sized objects on the heap the
overhead that dynamic memory allocation APIs impose will quickly become
significant.

In some cases, allocating a large amount of memory at once and using a
slabbing allocator to carve the large block into fixed-sized units that
are used to service requests for memory out will help to reduce memory
fragmentation significantly and, potentially, improve overall
performance.

The new `SlabAllocator<>` class exposes an API that is _similar_ to the
C++ concept of an `Allocator` but it is not meant to be a
general-purpose allocator.

It should not be used unless profiling and analysis of specific memory
allocation patterns indicates that the additional complexity introduced
will improve the performance of the system overall, and subsequent
profiling proves it.

A helper class, `SlabAllocatorSet<>` simplifies handling of variably
sized objects that benefit from slab allocations.

This commit incorporates improvements suggested by Greg Popovitch
(@greg7mdp).

* Optimize `SHAMapItem` and leverage new slab allocator:

The `SHAMapItem` class contains a variable-sized buffer that holds the
serialized data associated with a particular item inside a `SHAMap`.

Previously, the buffer for the serialized data was allocated separately.
The fact that most instances of `SHAMapItem` were wrapped around a
`std::shared_ptr` meant that an instantiation might result in up to
three separate memory allocations.

This commit switches away from `std::shared_ptr` for `SHAMapItem` and
uses `boost::intrusive_ptr` instead, allowing the reference count for an
instance to live inside the instance itself. Coupled with using a
slab-based allocator to optimize memory allocation for the most commonly
sized buffers, the net result is significant memory savings. In testing,
the reduction in memory usage hovers between 400MB and 650MB. Other
scenarios might result in larger savings.

In performance testing with NFTs, this commit reduces memory size by
about 15% sustained over long duration.

* Avoid using std::shared_ptr when not necessary:

The `Ledger` class contains two `SHAMap` instances: the state and
transaction maps. Previously, the maps were dynamically allocated using
`std::make_shared` despite the fact that they did not require lifetime
management separate from the lifetime of the `Ledger` instance to which
they belong.

The two `SHAMap` instances are now regular member variables. Some smart
pointers and dynamic memory allocation was avoided by using stack-based
alternatives.

intelliot avatar Apr 05 '23 06:04 intelliot

I think that the three commits are functionally separate, and each is self-contained. IMO keeping them as-is is cleaner, more professional and makes it easier to review and, if needed, run git bisect in the future.

nbougalis avatar Apr 06 '23 23:04 nbougalis

We had a comparable situation with https://github.com/XRPLF/rippled/pull/4192: the PR author recommended keeping the commits as-is, and so we did. Later, we learned there was some discontent with that, and that we should have squashed all the commits together.

I don't personally have any opinion either way, but if we're going to recommend merging this as three separate commits, then we should at least confirm that each commit stands on its own and passes unit tests & clang-format. It may make sense to open 2-3 new PRs for this purpose.

intelliot avatar Apr 08 '23 05:04 intelliot

What we decided in #4192 is that if reviewers feel that a commit should be merged, and the commit message does not begin with [FOLD], then the reviewer should recommend a change to the commit message. And vice-versa.

HowardHinnant avatar Apr 08 '23 12:04 HowardHinnant

blocked: let's get some agreement on whether this PR should end up as 3 commits, or 1.

If it's 3 commits, I think it would be best to open a couple of separate PRs the individual parts, so that we can confirm they are mergable on their own (they make sense and pass unit tests & clang-format).

intelliot avatar Apr 10 '23 19:04 intelliot

@seelabs confirmed 3 commits for this is fine. I am rebasing the commits individually to ensure they pass CI on GitHub. If they pass, I will merge them no earlier than 2023-04-11 at 8:00 AM PT. If you have any objections, please comment here on this PR.

intelliot avatar Apr 11 '23 00:04 intelliot

Suggested commit message:

Avoid using std::shared_ptr when not necessary: (#4218)

The `Ledger` class contains two `SHAMap` instances: the state and
transaction maps. Previously, the maps were dynamically allocated using
`std::make_shared` despite the fact that they did not require lifetime
management separate from the lifetime of the `Ledger` instance to which
they belong.

The two `SHAMap` instances are now regular member variables. Some smart
pointers and dynamic memory allocation was avoided by using stack-based
alternatives.

Commit 3 of 3 in #4218.

intelliot avatar Apr 11 '23 17:04 intelliot

will merge after github checks pass, if I don't hear any objections

intelliot avatar Apr 11 '23 17:04 intelliot

@manojsdoshi to try to corroborate the 15-22% memory use reduction

Released in 1.11.0.

intelliot avatar Jun 05 '23 23:06 intelliot