quorum
quorum copied to clipboard
fix: memory leak from policy registration
Start a QBFT/IBFT cluster with an arbitrary number of non-validator nodes. Memory usage of non-validator nodes grows indefinitely, causing OOM eventually (Reference issue #1660). Issue can easily be reproduced by logging out the length of validator sets within the policy's registry at runtime.
Non validator nodes calls RegisterValidatorSet
method for every single block, adding on to the in memory ProposerPolicy.registry. GC is unable to free the memory since ClearRegistry is never called. There are cases when a restart of the quorum node causes even validator nodes to indefinitely add new validator sets to the registry.
Registry is currently not used in any of the other implementation code, this PR sets a cap to the maximum number of validator sets allowed in memory at any one point in time (in the event when registry is planned for future use cases). It caps the registry up to the latest 128 blocks, aligned to the number of in-memory state transitions. Older registrations are sequentially booted out of the registry.
@YoshihitoAso, this is possibly related to the issue you raised in https://github.com/BoostryJP/quorum/issues/56
After implementing the fix, we conducted a thorough profiling analysis on the non-validator nodes. The observed memory behavior now indicates a significant improvement, with memory consumption stabilizing instead of exhibiting indefinite growth, as previously observed.
Please refer to the attached memory consumption data for the past 3 days.
We used pprof snapshots to compare the memory usage before and after the fix. Here are the key observation:
-
Before the Fix: The pprof snapshot distinctly reveals that the predominant portion of memory consumption stemmed from 'validator.newDefaultSet'.
-
After the Fix: Upon re-profiling, 'validator.newDefaultSet' no longer appears in the top 20 memory consumers. Instead, only 'leveldb' remains as one of the main top 5 memory consumers, which is expected during block minting.
In summary: The fix has effectively addressed memory leakage that was observed, notably, 'validator.newDefaultSet' no longer contributes to excessive memory consumption post-fix, aligning with expected behavior. These observations highlight the efficacy of the fix to show a positive results in mitigating memory leakage and enhancing overall system stability / reliability.