quorum icon indicating copy to clipboard operation
quorum copied to clipboard

fix: memory leak from policy registration

Open rodion-lim-partior opened this issue 1 year ago • 2 comments

Start a QBFT/IBFT cluster with an arbitrary number of non-validator nodes. Memory usage of non-validator nodes grows indefinitely, causing OOM eventually (Reference issue #1660). Issue can easily be reproduced by logging out the length of validator sets within the policy's registry at runtime.

Non validator nodes calls RegisterValidatorSet method for every single block, adding on to the in memory ProposerPolicy.registry. GC is unable to free the memory since ClearRegistry is never called. There are cases when a restart of the quorum node causes even validator nodes to indefinitely add new validator sets to the registry.

Registry is currently not used in any of the other implementation code, this PR sets a cap to the maximum number of validator sets allowed in memory at any one point in time (in the event when registry is planned for future use cases). It caps the registry up to the latest 128 blocks, aligned to the number of in-memory state transitions. Older registrations are sequentially booted out of the registry.

rodion-lim-partior avatar Dec 11 '23 10:12 rodion-lim-partior

@YoshihitoAso, this is possibly related to the issue you raised in https://github.com/BoostryJP/quorum/issues/56

frank-lim-partior avatar Dec 13 '23 03:12 frank-lim-partior

After implementing the fix, we conducted a thorough profiling analysis on the non-validator nodes. The observed memory behavior now indicates a significant improvement, with memory consumption stabilizing instead of exhibiting indefinite growth, as previously observed.

Please refer to the attached memory consumption data for the past 3 days. image

We used pprof snapshots to compare the memory usage before and after the fix. Here are the key observation:

  • Before the Fix: The pprof snapshot distinctly reveals that the predominant portion of memory consumption stemmed from 'validator.newDefaultSet'. image

  • After the Fix: Upon re-profiling, 'validator.newDefaultSet' no longer appears in the top 20 memory consumers. Instead, only 'leveldb' remains as one of the main top 5 memory consumers, which is expected during block minting. image

In summary: The fix has effectively addressed memory leakage that was observed, notably, 'validator.newDefaultSet' no longer contributes to excessive memory consumption post-fix, aligning with expected behavior. These observations highlight the efficacy of the fix to show a positive results in mitigating memory leakage and enhancing overall system stability / reliability.

cheeweng-tan-partior avatar Dec 13 '23 10:12 cheeweng-tan-partior