besu QBFT fast recovery by frequent multicasting of RC message

PR description

When a node moves up a round, e.g. from 6 -> 7, it only sends a single round-change message
This means that if a node misses the round-change message from another node it doesn't know the node is on round 7
Nodes can publish their current round every N seconds while they are still in that round

Enabled by the experimental option --Xqbft-fast-recovery . It's very effective when combined with --Xqbft-enable-early-round-change

Mar 07 '25 21:03 pullurib

Added a Frequent multicaster class to resend a message to validators at regular intervals . This is used for round change messages upon using fast-recovery option
Moved early-round-change , fast-recovery and frequent multicaster to final state
Testcases for frequent mutlicaster unit tests and ATs for new option.

Mar 12 '25 05:03 pullurib

@matthew1001 is working on moving QBFT functionality to a plugin . This would be a part of that plugin then.

Mar 18 '25 00:03 pullurib

This pr is stale because it has been open for 30 days with no activity.

Apr 17 '25 02:04 github-actions[bot]

I think the plugin version of QBFT could be some way off @pullurib @macfarla Could I ask that we move this PR along in its current shape and the logic will get included in the shift to a QBFT plugin as a matter of coure.

Apr 28 '25 10:04 matthew1001

I think the plugin version of QBFT could be some way off @pullurib @macfarla Could I ask that we move this PR along in its current shape and the logic will get included in the shift to a QBFT plugin as a matter of coure.

sure, I don't think we need to block this on making it a plugin. I'm prob not the right person to review this PR in detail though, someone with more QBFT knowledge should do that.

Apr 30 '25 04:04 macfarla

@pullurib there's a compile error that needs fixing before this can be reviewed in detail

May 29 '25 23:05 macfarla

This pr is stale because it has been open for 30 days with no activity.

Jun 29 '25 02:06 github-actions[bot]

Fixed the compile error and merged latest main

Jul 08 '25 19:07 pullurib

@pullurib from the commits it looked like maybe you tried an approach using RoundTimer first...is there a reason that didn't work out and you settled on a separate retrying thread that needs to be stopped at the appropriate time?

At first glance, it seems complex to have an executor service as well as a series of Timers managing the messages...but maybe there's a good reason for it?

Jul 23 '25 03:07 siladu

@pullurib from the commits it looked like maybe you tried an approach using RoundTimer first...is there a reason that didn't work out and you settled on a separate retrying thread that needs to be stopped at the appropriate time?

At first glance, it seems complex to have an executor service as well as a series of Timers managing the messages...but maybe there's a good reason for it?

Yeah, needed something to trigger a multicast every few seconds . First attempt had the logic spread across timer and multicaster but ran into some race conditions around timer expiry. Also, frequent multicasting was not exactly timer's concern. So bundled it all together in FrequentMulticaster

Jul 23 '25 20:07 pullurib

This pr is stale because it has been open for 30 days with no activity.

Aug 23 '25 02:08 github-actions[bot]

This pr was closed because it has been inactive for 14 days since being marked as stale.

Sep 07 '25 02:09 github-actions[bot]

besu besu copied to clipboard

QBFT fast recovery by frequent multicasting of RC message

PR description

besu
besu copied to clipboard