rippled icon indicating copy to clipboard operation
rippled copied to clipboard

Fixes a bug causing inefficient unlocking of Consensus.

Open mtrippled opened this issue 2 years ago • 1 comments

High Level Overview of Change

Performance fix for bug introduced in https://github.com/XRPLF/rippled/commit/e8a7b2a1fc5b62b7ea92bd12fa63b2852125906a

  • #4505

The bug causes a time-consuming part of the accept phase to remain within a lock that delays other consensus activities.

Context of Change

The bug introduced in phaseEstablish() broken logic to lock and unlock around portions of code that should be unlocked for optimal performance. Previously, that code had all been placed on the job queue. The improper unlocking came as a result of introducing a unique_lock within phaseEstablish() to then unlock and lock accordingly. However, this broke because the unique_lock introduced a 2nd lock. And the locks didn't know about each other so unlocking could only be done once, not for the lock_guard covering timerEntry(). This defeated the purpose and caused a time-consuming function to become locked. The fix is instead of lock_guard(s) protecting all consensus activities, use unique_lock instead and pass as parameter to each function that potentially calls phaseEstablish(). This allows the code in question to be safely unlocked and re-locked.

Type of Change

Another way to implement this fix maintains the same function signatures but requires manually manipulating the Consensus mutex. It's less code but I think it's less readable, and likely harder to follow when things are locked vs not. Another possible route is to turn Consensus into a coroutine that proceeds from start to finish. That would probably simplify things the most from a readability standpoint.

  • [X] Bug fix (non-breaking change which fixes an issue)
  • [ ] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [ ] Refactor (non-breaking change that only restructures code)
  • [ ] Tests (you added tests for code that already exists, or your new feature included in this PR)
  • [ ] Documentation update
  • [ ] Chore (no impact to binary, e.g. .gitignore, formatting, dropping support for older tooling)
  • [ ] Release

API Impact

  • [ ] Public API: New feature (new methods and/or new fields)
  • [ ] Public API: Breaking change (in general, breaking changes should only impact the next api_version)
  • [ ] libxrpl change (any change that may affect libxrpl or dependents of libxrpl)
  • [ ] Peer protocol change (must be backward compatible or bump the peer protocol version)

It passes all existing unit tests. I tested and found that the new code unlocks and locks appropriately.

mtrippled avatar Nov 15 '23 17:11 mtrippled

@mtrippled : given #4505 was reverted for now, should this PR still be reviewed?

intelliot avatar Jan 25 '24 06:01 intelliot