rippled
rippled copied to clipboard
Intrusive shamap inner (SHAMapTreeNode memory reduction)
High Level Overview of Change
This branch is a memory optimization. It results in about a 2-2.5 GB savings (10%-15%) when measured on my local validator.
This branch has a long history. About two years ago I wrote a patch to
remove the mutex from shamap inner nodes (ref:
https://github.com/seelabs/rippled/tree/lockfree-tagged-cache). At the
time I measured a large memory savings of about 2 gig. Unfortunately,
the code required using the folly library, and I was hesitant to
introduce such a large dependency into rippled (especially one that was
so hard to build). This branch resurrects that old work and removes the
folly dependency.
The old branch used a lockless atomic shared pointer. This new branch introduces a intrusive pointer type. Unlike boost's intrusive pointer, this intrusive pointer can handle both strong and weak pointers (needed for the tagged cache). Since this is an intrusive pointer type, in order to support weak pointers, the object is not destroyed when the strong count goes to zero. Instead, it is "partially destroyed" (for example, inner nodes will reset their children). This intrusive pointer takes 16-bits for the strong count and 14-bits for the weak count, and takes one 64-bit pointer to point at the object. This is much smaller than a std::shared_pointer, which needs a control block to hold the strong and weak counts (and potentially other objects), as well as an extra pointer to point at the control block.
The intrusive shared pointer can be modified to support for atomic operations (there is a branch that adds this support). These atomic operations can be used instead of the lock when changing inner node pointers in the shamap.
Note: The space savings is independent from removing the locks from
shamap inner node. Therefor this work is divided into two phases. In the
first phase a non-atomic intrusive pointer is introduced and the locks
are kept. In a second phases the atomic intrusive pointer could be
introduced and the locks will be removed. Some of the code in this patch
is written with the upcoming atomic work in mind (for example, using
exchange in places). The atomic intrusive pointer also requires the C++
library to support atomic_ref. Both gcc and msvc support this, but at
the time of this writing clang's library does not.
Note: Intrusive pointer will be 12 bytes. The shared_ptr will be around 40 bytes, depending on implementation.
When measuring memory usage on a validator, this patch resulted in between a 10 and 15% memory savings.
Type of Change
Memory Optimization
FYI: This is a branch that has some work to make the shared pointer atomic: https://github.com/seelabs/rippled/tree/atomic_shamap_inner. With that patch we may be able to remove the inner node locks. But I'm putting that on the back-burner for a while.
One important todo before this can be merge is to confirm that we can get away with 16-bit strong counts and 14-bit weak counts.
important todo
> 2 ** 16 -1
65535
> 2 ** 14 -1
16383
That's max refs for strong/weak?
important todo
> 2 ** 16 -1 65535 > 2 ** 14 -1 16383That's max refs for strong/weak?
Yes, that's right. It's trivial to change that so it the counts are 32 bits and 30 bit, but at the cost of 4 extra bytes per object. If we can, I'd like to avoid those 4 extra bytes.
at the cost of 4 extra bytes per object
I guess that adds up with the current node counts. Let's say ~20M for just one state tree. That's ~76MB there alone.
Damn mac eh
One important todo before this can be merge is to confirm that we can get away with 16-bit strong counts and 14-bit weak counts.
How would you suggest we make that confirmation? Do you know which kinds of workloads are likely to run up those reference counts?
@scottschurr I think we need to audit the code and understand all the places that are storing strong and weak shamap pointers and decide if there are ways this could blow up. I have not done that audit, but we'll need to do it before merging this. However, that audit shouldn't stop a review from starting. Even if we decide the limit is too low this is still worth merging, we'll just have to increase the limit (for slightly less memory savings).
important todo
> 2 ** 16 -1 65535 > 2 ** 14 -1 16383That's max refs for strong/weak?
Yes, that's right. It's trivial to change that so it the counts are 32 bits and 30 bit, but at the cost of 4 extra bytes per object. If we can, I'd like to avoid those 4 extra bytes.
I think this is more than a todo. It's a blocker. We should not approve this PR until we know the answer to this question.
important todo I think this is more than a todo. It's a blocker. We should not approve this PR until we know the answer to this question. @HowardHinnant I agree, and I have no intention on merging this until I audit the code to decide if we need larger count sizes - maybe "important todo" wasn't worded as well as I should have. On the other hand the code can be reviewed as-is now and if I need to up the count sizes there will be a slightly smaller memory saving (but it should still be worth doing) and it will be trivial to review the changes to the count sizes. I don't think the review needs to wait for the audit of the count sizes.