rippled icon indicating copy to clipboard operation
rippled copied to clipboard

perf: Remove unnecessary caches

Open bthomee opened this issue 7 months ago • 6 comments

High Level Overview of Change

NuDB and RocksDB internally already use caches, making the additional caches in the code not very valuable or even unnecessary. Moreover, some caches are defined but are not even used, which can also be removed.

Context of Change

The codebase contains many caches, and this PR is one of several to refactor the codebase with as goal to improve performance, such as reducing memory use, to support future account, trust line, and transaction growth.

This PR removes unnecessary caches in DatabaseNodeImp and SHAMapStoreImp; in the former we will instead rely on the caches built into the underlying NuDB or RocksDB stores, while the latter merely instantiated the caches but never used them. A performance analysis demonstrated that memory use reduced by 7%, CPU use increased by 2%, while all other metrics did not change significantly - a reasonable trade-off between memory and CPU. A more extensive performance analysis will be performed before this change is to be released.

Type of Change

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [ ] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [ ] Refactor (non-breaking change that only restructures code)
  • [X] Performance (increase or change in throughput and/or latency)
  • [ ] Tests (you added tests for code that already exists, or your new feature included in this PR)
  • [ ] Documentation update
  • [ ] Chore (no impact to binary, e.g. .gitignore, formatting, dropping support for older tooling)
  • [ ] Release

bthomee avatar May 20 '25 14:05 bthomee

Codecov Report

:x: Patch coverage is 25.92593% with 20 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 78.6%. Comparing base (8449c6c) to head (f3ea3e9).

Files with missing lines Patch % Lines
src/libxrpl/nodestore/DatabaseNodeImp.cpp 20.0% 20 Missing :warning:
Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##           develop   #5439   +/-   ##
=======================================
  Coverage     78.6%   78.6%           
=======================================
  Files          818     818           
  Lines        68981   68892   -89     
  Branches      8244    8220   -24     
=======================================
- Hits         54195   54148   -47     
+ Misses       14786   14744   -42     
Files with missing lines Coverage Δ
include/xrpl/nodestore/detail/DatabaseNodeImp.h 73.9% <ø> (-5.0%) :arrow_down:
src/xrpld/app/misc/SHAMapStoreImp.cpp 75.5% <100.0%> (-0.5%) :arrow_down:
src/xrpld/app/misc/SHAMapStoreImp.h 95.8% <ø> (ø)
src/libxrpl/nodestore/DatabaseNodeImp.cpp 32.6% <20.0%> (-1.1%) :arrow_down:

... and 8 files with indirect coverage changes

Impacted file tree graph

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar May 20 '25 14:05 codecov[bot]

Just looping in @kuznetsss before approving, I don’t believe we do any node-level DB caching in Clio, right? So no changes should be needed on our side?

PeterChen13579 avatar May 20 '25 14:05 PeterChen13579

As I understand this is rippled's internal database. So no changes on Clio's side are required.

kuznetsss avatar May 20 '25 15:05 kuznetsss

I was looking into memory usago a few weeks ago too and noticed some oddities with the node store cache. Sharing my notes.


So regarding the NodeStore cache, like you, I'm of the opinion that it is actually largely pointless.

  • Peer messages use the SHAMap as source of truth and use a custom wire encoding
  • There's already a TreeNodeCache for SHAMap nodes that you may want to have laying around

Why was it ever there!? Well, it seems like back in the day, for some reason the TreeNodeCache was actually keyed by an id representing a node's position in the tree. This was also before NUDB. Think LevelDB and SQLite. So you had the tree node cache often missing, and a slow k/v store, so I'm guessing it was a backup cache.

It's kind of crazy, we read from the node store, then create a DecodedBlob from decompressed data, only to immediately just make a copy of that, and then cache that "because reasons"

Now days? Shrug. It seems you can disable it via setting cache_age=0 and cache_size=0 in the [node_db] configuration.


Attached Claude nodes when I did the investigation


NodeObject Cache Investigation

Summary of Findings

After investigating the NodeObject cache in rippled/xahaud, it appears to serve no clear purpose in the current architecture. Evidence suggests it might be a remnant from an earlier design that's now just adding overhead to node fetch operations. But without access to original design discussions, we can't be completely certain.

Historical Investigation

A Theory: TreeNodeCache Was Originally Keyed by NodeID

Looking at the July 2013 codebase (commit c645901d58), there's interesting evidence that might explain why the NodeObject cache exists. The code suggests there may have been a fundamental issue with how the TreeNodeCache was implemented.

What the 2013 Code Shows:

  1. The mTNByID Map (ripple_SHAMap.h:225):

    boost::unordered_map<SHAMapNode, SHAMapTreeNode::pointer> mTNByID;
    

    The map name itself - "Tree Nodes By ID" - suggests it was keyed by NodeID, not hash.

  2. The HashedObjectStore Cache (ripple_HashedObjectStore.h):

    class HashedObjectStore
    {
    public:
        HashedObjectStore (int cacheSize, int cacheAge);
    
        HashedObject::pointer retrieve (uint256 const& hash)
        {
            if (mLevelDB)
                return retrieveLevelDB (hash);
            return retrieveSQLite (hash);
        }
    
    private:
        TaggedCache<uint256, HashedObject, UptimeTimerAdapter> mCache;
        KeyCache <uint256, UptimeTimerAdapter> mNegativeCache;
    

    The HashedObjectStore had its own cache keyed by hash (uint256), separate from the TreeNodeCache.

  3. How getNode() Worked (ripple_SHAMap.cpp:204):

    SHAMapTreeNode::pointer node = checkCacheNode(id);  // Lookup by NodeID
    if (node) {
    #ifdef DEBUG
        if (node->getNodeHash() != hash)  // Then verify hash matches
            throw std::runtime_error("invalid node");
    #endif
    

    It seems to look up by NodeID first, then check if the hash matches - which could be problematic.

  4. The Fallback Path (ripple_SHAMap.cpp:820):

    SHAMapTreeNode::pointer SHAMap::fetchNodeExternalNT(const SHAMapNode& id, uint256 const& hash)
    {
        // When mTNByID misses...
        HashedObject::pointer obj(theApp->getHashedObjectStore().retrieve(hash));
    

    When the ID-based lookup failed, it fell back to a hash-based lookup in the HashedObjectStore.

  5. How HashedObjectStore Retrieved Objects (ripple_HashedObjectStore.cpp):

    HashedObject::pointer HashedObjectStore::retrieveLevelDB (uint256 const& hash)
    {
        HashedObject::pointer obj = mCache.fetch (hash);
    
        if (obj || mNegativeCache.isPresent (hash) || !theApp || !theApp->getHashNodeLDB ())
            return obj;
    
        // ... fetch from LevelDB if not in cache
        obj = LLRetrieve (hash, theApp->getHashNodeLDB ());
    
        if (!obj)
        {
            mNegativeCache.add (hash);
            return obj;
        }
    
        mCache.canonicalize (hash, obj);
        return obj;
    }
    

    The HashedObjectStore maintained its own cache using the hash as the key, which would succeed even when mTNByID missed.

This could have created an interesting problem:

  • The same content (same hash) might exist at different positions (different IDs) in different trees
  • An ID-based cache would miss when looking for that content at a different position
  • A hash-based cache would find it

Could NodeObject Cache Have Been a Workaround?

If the above theory is correct, the NodeObject cache (keyed by hash) might have been added to catch what the TreeNodeCache (keyed by ID) missed. This would explain why there were two caches doing seemingly similar things.

It's worth noting that in 2013, the backend databases (SQLite and LevelDB) would have been considerably slower than modern options like NuDB or RocksDB. Given these performance characteristics, having a secondary cache to avoid repeated database lookups when the TreeNodeCache missed would have made more sense.

What the 2014 Documentation Says

By July 2014, the team seemed aware of these issues. From the SHAMap README (commit ea44497):

"When we navigate the tree (say, like SHAMap::walkTo()) we currently ask each node for information that we could determine locally... The next refactor should remove all calls to SHAMapTreeNode::GetID()... Then we can change the SHAMap::mTNByID member to be mTNByHash."

This suggests they wanted to move away from ID-based lookups entirely.

Current State

Today's code uses direct pointer traversal:

std::shared_ptr<SHAMapTreeNode> node = parent->getChild(branch);
if (node || !backed_)
    return node;

It seems like the architecture has evolved significantly, but the NodeObject cache remains - possibly as a remnant of the old workaround.

Current Caching Layers

Looking at the current code, there appear to be three caching layers:

1. TreeNodeCache

  • Caches fully parsed tree nodes
  • Seems useful for historical ledger queries
  • Uses the modern hash-based lookup

2. NodeObject Cache

  • Caches raw serialized bytes WITHOUT the 9-byte prefix (the prefix is stripped during the copy)
  • Unclear what benefit it provides today
  • Forces an extra copy operation on every fetch to remove the prefix

3. In-Memory SHAMap

  • Direct pointer tree structure for current ledgers
  • Handles most operations without any cache lookups

Observed Inefficiencies

The Copy Problem

Every node fetch from disk appears to do an unnecessary copy:

// In DecodedBlob::createObject()
Blob data(m_objectData, m_objectData + m_dataBytes);  // COPY!

This happens because:

  • The decompressed data includes a 9-byte prefix
  • NodeObject doesn't want the prefix
  • So it copies everything except the first 9 bytes

Where NodeObject Cache Doesn't Seem to Help

  1. Current Ledger Operations: Use direct pointers, no cache needed
  2. Historical Queries: TreeNodeCache already has the parsed nodes
  3. P2P Syncing: Uses its own wire format
  4. Startup: Cache is empty, so it's just overhead
  5. RPC Queries: TreeNodeCache handles these better

Questions and Uncertainties

  • Was the NodeObject cache really a workaround for the ID-based TreeNodeCache? The evidence suggests it, but we can't be certain.
  • Are there any edge cases where the NodeObject cache provides value that we're missing?
  • Why wasn't it removed when the architecture changed? Fear of breaking something? Or does it serve some purpose we haven't identified?

Potential Impact of Removal

If the NodeObject cache truly serves no purpose, removing it could:

  • Eliminate the unnecessary copy on every disk read
  • Reduce memory usage
  • Simplify the code
  • Possibly improve performance

But without comprehensive testing, it's hard to be certain there aren't hidden dependencies.

Disabling the Cache

Interestingly, the NodeObject cache is already optional in the codebase. The cache initialization logic in DatabaseNodeImp.h shows how it can be disabled:

// Lines 46-76 from DatabaseNodeImp.h
std::optional<int> cacheSize, cacheAge;

if (config.exists("cache_size"))
{
    cacheSize = get<int>(config, "cache_size");
    if (cacheSize.value() < 0)
    {
        Throw<std::runtime_error>(
            "Specified negative value for cache_size");
    }
}

if (config.exists("cache_age"))
{
    cacheAge = get<int>(config, "cache_age");
    if (cacheAge.value() < 0)
    {
        Throw<std::runtime_error>(
            "Specified negative value for cache_age");
    }
}

if (cacheSize != 0 || cacheAge != 0)
{
    cache_ = std::make_shared<TaggedCache<uint256, NodeObject>>(
        "DatabaseNodeImp",
        cacheSize.value_or(0),
        std::chrono::minutes(cacheAge.value_or(0)),
        stopwatch(),
        j);
}

The key line is if (cacheSize != 0 || cacheAge != 0) - the cache is only created if at least one of these values is non-zero. Setting both to 0 leaves cache_ as nullptr.

Throughout DatabaseNodeImp.cpp, the code consistently checks if cache_ exists before using it:

// Line 55: storing objects
if (cache_ && !skipCache)

// Line 71: fetching objects  
if (cache_)

// Line 98: conditional fetch
cache_ ? cache_->fetch(hash) : nullptr

This means the cache can be disabled through configuration by setting both cache_age and cache_size to 0. When disabled:

  • All cache operations are skipped
  • The code falls back to direct backend access
  • No performance penalty from the unnecessary copy operation

This provides an easy way to test whether the cache provides any real benefit - just disable it and measure the impact.

Additional Problems: P2P Cache Pollution

Another major issue with the NodeObject cache is that peer requests pollute it with one-off data. When peers request nodes via TMGetObjectByHash, the code fetches from NodeStore, which adds the data to the cache even though it's quite possibly never going to be used again, especially if it's for a historical ledger.

This means if you have the cache configured, it will continuously grow with useless data every time peers request nodes from you. The cache fills with garbage that has near-zero probability of reuse, evicting any potentially useful entries. It's essentially uncontrolled memory growth triggered by external peer requests. See .ai-docs/nodeobject-cache-p2p-fix.md for a proposed solution.

Recommendations for Further Investigation

  1. Test with cache_size=0 and cache_age=0 to disable the cache completely
  2. Run benchmarks comparing performance with and without the cache
  3. Monitor memory usage and query latency in both configurations
  4. Look for any unexpected behavior when cache is disabled
  5. Investigate P2P request patterns to quantify cache pollution

The fact that the cache is already optional suggests the developers recognized it might not always be necessary. Testing with it disabled would be a low-risk way to validate whether it's truly vestigial.

sublimator avatar Aug 13 '25 12:08 sublimator

@sublimator I loved the historical context and the results from your deep dive - the amount of detail is very much appreciated!

As for your recommendations, our old performance analysis platform didn't let me modify the rippled.cfg, hence I created this PR in which I completely removed the code so I could evaluate its memory and cpu usage. It did show memory improvements at a slight increase in CPU. However, as our platform turned out to have some shortcomings (for one, it measured system memory rather than process memory, while it was being run on a machine in the cloud that can be shared with an arbitrary number of other services), we have been working on a new platform. This new platform is just about ready, so I'll be rerunning the experiment there to ensure there indeed performance improvements without any ill side-effects.

bthomee avatar Aug 14 '25 18:08 bthomee

@bthomee

You're welcome, I assumed you could recontextualize/summarize it! Was my Claude notes.

sublimator avatar Aug 18 '25 00:08 sublimator