besu
besu copied to clipboard
QBFT fast recovery by frequent multicasting of RC message
PR description
- When a node moves up a round, e.g. from 6 -> 7, it only sends a single round-change message
- This means that if a node misses the round-change message from another node it doesn't know the node is on round 7
- Nodes can publish their current round every N seconds while they are still in that round
Enabled by the experimental option --Xqbft-fast-recovery . It's very effective when combined with --Xqbft-enable-early-round-change
- Added a Frequent multicaster class to resend a message to validators at regular intervals . This is used for round change messages upon using fast-recovery option
- Moved early-round-change , fast-recovery and frequent multicaster to final state
- Testcases for frequent mutlicaster unit tests and ATs for new option.
@matthew1001 is working on moving QBFT functionality to a plugin . This would be a part of that plugin then.
This pr is stale because it has been open for 30 days with no activity.
I think the plugin version of QBFT could be some way off @pullurib @macfarla Could I ask that we move this PR along in its current shape and the logic will get included in the shift to a QBFT plugin as a matter of coure.
I think the plugin version of QBFT could be some way off @pullurib @macfarla Could I ask that we move this PR along in its current shape and the logic will get included in the shift to a QBFT plugin as a matter of coure.
sure, I don't think we need to block this on making it a plugin. I'm prob not the right person to review this PR in detail though, someone with more QBFT knowledge should do that.
@pullurib there's a compile error that needs fixing before this can be reviewed in detail
This pr is stale because it has been open for 30 days with no activity.
Fixed the compile error and merged latest main
@pullurib from the commits it looked like maybe you tried an approach using RoundTimer first...is there a reason that didn't work out and you settled on a separate retrying thread that needs to be stopped at the appropriate time?
At first glance, it seems complex to have an executor service as well as a series of Timers managing the messages...but maybe there's a good reason for it?
@pullurib from the commits it looked like maybe you tried an approach using RoundTimer first...is there a reason that didn't work out and you settled on a separate retrying thread that needs to be stopped at the appropriate time?
At first glance, it seems complex to have an executor service as well as a series of Timers managing the messages...but maybe there's a good reason for it?
Yeah, needed something to trigger a multicast every few seconds . First attempt had the logic spread across timer and multicaster but ran into some race conditions around timer expiry. Also, frequent multicasting was not exactly timer's concern. So bundled it all together in FrequentMulticaster
This pr is stale because it has been open for 30 days with no activity.
This pr was closed because it has been inactive for 14 days since being marked as stale.