besu icon indicating copy to clipboard operation
besu copied to clipboard

QBFT fast recovery by frequent multicasting of RC message

Open pullurib opened this issue 8 months ago • 7 comments

PR description

  • When a node moves up a round, e.g. from 6 -> 7, it only sends a single round-change message
  • This means that if a node misses the round-change message from another node it doesn't know the node is on round 7
  • Nodes can publish their current round every N seconds while they are still in that round

Enabled by the experimental option --Xqbft-fast-recovery . It's very effective when combined with --Xqbft-enable-early-round-change

pullurib avatar Mar 07 '25 21:03 pullurib

  • Added a Frequent multicaster class to resend a message to validators at regular intervals . This is used for round change messages upon using fast-recovery option
  • Moved early-round-change , fast-recovery and frequent multicaster to final state
  • Testcases for frequent mutlicaster unit tests and ATs for new option.

pullurib avatar Mar 12 '25 05:03 pullurib

@matthew1001 is working on moving QBFT functionality to a plugin . This would be a part of that plugin then.

pullurib avatar Mar 18 '25 00:03 pullurib

This pr is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar Apr 17 '25 02:04 github-actions[bot]

I think the plugin version of QBFT could be some way off @pullurib @macfarla Could I ask that we move this PR along in its current shape and the logic will get included in the shift to a QBFT plugin as a matter of coure.

matthew1001 avatar Apr 28 '25 10:04 matthew1001

I think the plugin version of QBFT could be some way off @pullurib @macfarla Could I ask that we move this PR along in its current shape and the logic will get included in the shift to a QBFT plugin as a matter of coure.

sure, I don't think we need to block this on making it a plugin. I'm prob not the right person to review this PR in detail though, someone with more QBFT knowledge should do that.

macfarla avatar Apr 30 '25 04:04 macfarla

@pullurib there's a compile error that needs fixing before this can be reviewed in detail

macfarla avatar May 29 '25 23:05 macfarla

This pr is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar Jun 29 '25 02:06 github-actions[bot]

Fixed the compile error and merged latest main

pullurib avatar Jul 08 '25 19:07 pullurib

@pullurib from the commits it looked like maybe you tried an approach using RoundTimer first...is there a reason that didn't work out and you settled on a separate retrying thread that needs to be stopped at the appropriate time?

At first glance, it seems complex to have an executor service as well as a series of Timers managing the messages...but maybe there's a good reason for it?

siladu avatar Jul 23 '25 03:07 siladu

@pullurib from the commits it looked like maybe you tried an approach using RoundTimer first...is there a reason that didn't work out and you settled on a separate retrying thread that needs to be stopped at the appropriate time?

At first glance, it seems complex to have an executor service as well as a series of Timers managing the messages...but maybe there's a good reason for it?

Yeah, needed something to trigger a multicast every few seconds . First attempt had the logic spread across timer and multicaster but ran into some race conditions around timer expiry. Also, frequent multicasting was not exactly timer's concern. So bundled it all together in FrequentMulticaster

pullurib avatar Jul 23 '25 20:07 pullurib

This pr is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar Aug 23 '25 02:08 github-actions[bot]

This pr was closed because it has been inactive for 14 days since being marked as stale.

github-actions[bot] avatar Sep 07 '25 02:09 github-actions[bot]