Reduce peer message traffic for ledger data
High Level Overview of Change
Several changes to help reduce message traffic and improve logging and visibility.
- Suppress duplicate
TMGetLedgerandTMLedgerDatamessages, reducing the overhead of processing those messages. - Reduce the number of those messages sent to peers.
- Improve logging related to those messages, as well as a few other areas.
- Introduces a new protocol version to gate a new feature on the
TMLedgerDatamessage, which allows multiple identical requests to be replied to with one message.
These changes are organized into several commits which are organized logically separating each functional operation. They can be merged as-is, or squashed.
Context of Change
Analysis of the issue that led to #5115 identified heavy TMGetLedger request and TMLedgerData response traffic between nodes leading up to the syncing incidents. It was later determined that those messages were more a symptom of the problem, and not the root cause. However, leading up to identification of the root cause, these changes were being implemented to cut down on those messages, detect duplicates, etc. That reduction in unnecessary traffic is still valuable, so it's being included here.
Type of Change
- [X] Bug fix (non-breaking change which fixes an issue)
- [X] New feature (non-breaking change which adds functionality)
- [X] Performance (increase or change in throughput and/or latency)
API Impact
None.
Test Plan
- Verify that nodes are at least as reliable at staying in sync as they are now.
- Verify that the indicated messages are sent less frequently by upgraded nodes.
- Verify that peers that both have this change send fewer ledger messages to each other.
Future Tasks
I am still working on a follow-up to #4764 that makes use of these changes and other improvements to reduce the number of requests initiated by a given node in the first place.
Codecov Report
Attention: Patch coverage is 24.34988% with 320 lines in your changes missing coverage. Please review.
Project coverage is 77.9%. Comparing base (
7c9d652) to head (5438161). Report is 1 commits behind head on develop.
Additional details and impacted files
@@ Coverage Diff @@
## develop #5126 +/- ##
=========================================
- Coverage 78.2% 77.9% -0.2%
=========================================
Files 790 791 +1
Lines 67653 67975 +322
Branches 8161 8321 +160
=========================================
+ Hits 52879 52968 +89
- Misses 14774 15007 +233
| Files with missing lines | Coverage Δ | |
|---|---|---|
| include/xrpl/basics/base_uint.h | 96.9% <100.0%> (+<0.1%) |
:arrow_up: |
| include/xrpl/protocol/LedgerHeader.h | 100.0% <ø> (ø) |
|
| src/xrpld/app/ledger/detail/TimeoutCounter.cpp | 88.4% <100.0%> (+1.3%) |
:arrow_up: |
| src/xrpld/app/ledger/detail/TimeoutCounter.h | 100.0% <ø> (ø) |
|
| src/xrpld/app/misc/NetworkOPs.h | 100.0% <ø> (ø) |
|
| src/xrpld/overlay/Peer.h | 100.0% <ø> (ø) |
|
| src/xrpld/overlay/detail/PeerImp.h | 12.8% <ø> (ø) |
|
| src/xrpld/overlay/detail/ProtocolVersion.cpp | 86.4% <ø> (ø) |
|
| include/xrpl/basics/CanProcess.h | 95.5% <95.5%> (ø) |
|
| src/xrpld/app/consensus/RCLConsensus.cpp | 65.4% <0.0%> (ø) |
|
| ... and 10 more |
@Bronek Any outstanding issues for @ximinez to address?
@Bronek Any outstanding issues for @ximinez to address?
Yes, just added some. The loop and callback inside critical section can be probably resolved with an explanation e.g. a comment in code. The assert ones are trivial to fix.
Commit message:
Reduce duplicate peer traffic for ledger data (#5126)
- Drop duplicate outgoing TMGetLedger messages per peer
- Allow a retry after 30s in case of peer or network congestion.
- Addresses RIPD-1870
- (Changes levelization. That is not desirable, and will need to be fixed.)
- Drop duplicate incoming TMGetLedger messages per peer
- Allow a retry after 15s in case of peer or network congestion.
- The requestCookie is ignored when computing the hash, thus increasing
the chances of detecting duplicate messages.
- With duplicate messages, keep track of the different requestCookies
(or lack of cookie). When work is finally done for a given request,
send the response to all the peers that are waiting on the request,
sending one message per peer, including all the cookies and
a "directResponse" flag indicating the data is intended for the
sender, too.
- Addresses RIPD-1871
- Drop duplicate incoming TMLedgerData messages
- Addresses RIPD-1869
- Improve logging related to ledger acquisition
- Class "CanProcess" to keep track of processing of distinct items
---------
Co-authored-by: Valentin Balaschenko <[email protected]>