rippled icon indicating copy to clipboard operation
rippled copied to clipboard

Explicit heap trimming

Open vlntb opened this issue 2 months ago • 1 comments

High Level Overview of Change

  • Introduces a MallocTrim helper in libxrpl to centralise calls to ::malloc_trim(0) on Linux/glibc and optionally record RSS before/after for debugging.
  • Wires mallocTrim into production paths:
    • Application::doSweep: call after sweeps, when we’ve just freed a meaningful amount of heap.
    • Online delete (SHAMapStore): call after clearCaches / prior-ledger cleanup, when online delete drops large chunks of in-memory SHAMap / ledger state.
    • NetworkOPs sync completion: call when transitioning to FULL operating mode after initial sync, when sync-related temporary allocations have been freed.
  • On non-Linux or non-glibc builds, the helper reports supported = false and is effectively a no-op.

Context of Change

  • Long-running nodes on Linux/glibc showed steady Resident growth under heavier ledger-state scenarios.
  • 24h baseline without malloc_trim:
    • Resident: ~14.0 GB → ~ 32.9 GB (+18.6 GB, ~0.76 GB/h).
    • Referenced: low teens → ~ 32.7 GB (~+29 GB, ~1.2 GB/h).
  • 24h run with malloc_trim from Application::doSweep + online delete:
    • Resident: ~18.3 GB → ~19.0 GB (+0.7 GB, ~0.03 GB/h).
    • Referenced: ~18.0 GB → ~18.8 GB (+0.8 GB, ~0.03 GB/h).
  • Net effect over 24h:
    • ~42% lower Resident/Referenced at the end of the run (~32.9 GB → ~19.0 GB).
    • Growth rate drops from ~0.76 GB/h → ~0.03 GB/h (Resident) and ~1.2 GB/h → ~0.03 GB/h (Referenced), i.e. ~25–30× slower accumulation.
  • Over a longer ~40h trimmed run, we see ~147 GB returned to the OS (~3.7–3.8 GB/h) while steady-state Resident lives in an 18–20 GB band and the Resident–Referenced gap stays small (~0.2 GB). This suggests:
    • malloc_trim is doing real work against fragmentation and short-lived churn.
    • The remaining slow drift is driven by genuine long-lived working set, which will need separate follow-up (caches, data-structure sizing, protocol state), but the production hooks in this PR already give a clear and measurable win.

Type of Change

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [ ] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [ ] Refactor (non-breaking change that only restructures code)
  • [x] Performance (increase or change in throughput and/or latency)
  • [ ] Tests (you added tests for code that already exists, or your new feature included in this PR)
  • [ ] Documentation update
  • [ ] Chore (no impact to binary, e.g. .gitignore, formatting, dropping support for older tooling)
  • [ ] Release

API Impact

  • [ ] Public API: New feature (new methods and/or new fields)
  • [ ] Public API: Breaking change (in general, breaking changes should only impact the next api_version)
  • [ ] libxrpl change (any change that may affect libxrpl or dependents of libxrpl)
  • [ ] Peer protocol change (must be backward compatible or bump the peer protocol version)

vlntb avatar Nov 11 '25 16:11 vlntb

Codecov Report

:x: Patch coverage is 96.42857% with 2 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 78.6%. Comparing base (c9f17dd) to head (030e649).

Files with missing lines Patch % Lines
src/libxrpl/basics/MallocTrim.cpp 97.8% 1 Missing :warning:
src/xrpld/app/main/Application.cpp 0.0% 1 Missing :warning:
Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##           develop   #6022   +/-   ##
=======================================
  Coverage     78.6%   78.6%           
=======================================
  Files          818     820    +2     
  Lines        68938   68994   +56     
  Branches      8240    8236    -4     
=======================================
+ Hits         54177   54222   +45     
- Misses       14761   14772   +11     
Files with missing lines Coverage Δ
include/xrpl/basics/MallocTrim.h 100.0% <100.0%> (ø)
src/xrpld/app/misc/NetworkOPs.cpp 69.9% <100.0%> (+<0.1%) :arrow_up:
src/xrpld/app/misc/SHAMapStoreImp.cpp 76.1% <100.0%> (+0.7%) :arrow_up:
src/libxrpl/basics/MallocTrim.cpp 97.8% <97.8%> (ø)
src/xrpld/app/main/Application.cpp 68.5% <0.0%> (-0.1%) :arrow_down:

... and 3 files with indirect coverage changes

Impacted file tree graph

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Nov 13 '25 17:11 codecov[bot]