lighthouse icon indicating copy to clipboard operation
lighthouse copied to clipboard

Enforce batch buffer limit in range sync

Open jmcph4 opened this issue 2 years ago • 5 comments

Issue Addressed

#4321

Proposed Changes

  • Add guard to SyncingChain::send_batch in order to prevent the batch buffer from exceeding BATCH_BUFFER_SIZE
  • Add tracing logs to the range sync codebase in order to allow more visibility into the state of the batch buffer

Additional Info

  • Potential performance hit? When adding tracing to unstable (by cherry-picking, see here), the largest I've seen the buffer grow to locally is ~15 but the current BATCH_BUFFER_SIZE is 5

jmcph4 avatar Sep 11 '23 02:09 jmcph4

So, as of c6580aa993d93a2776f97147e28416003ee206ee, this crits with:

$ cargo run --bin lighthouse -- beacon_node
Sep 11 03:06:41.001 INFO Syncing                                 est_time: 6 days 18 hrs, speed: 12.45 slots/sec, distance: 7297683 slots (144 weeks 5 days), peers: 16, service: slot_notifier
Sep 11 03:06:41.002 WARN Not ready for merge                     info: The --execution-endpoint flag is not specified, this is a requirement for the merge, service: slot_notifier
Sep 11 03:06:41.002 WARN Not ready for Capella                   info: The --execution-endpoint flag is not specified, this is a requirement post-merge, hint: try updating the execution endpoint, service: slot_notifier
Sep 11 03:06:45.728 CRIT Chain removed                           op: batch processing result, reason: WrongChainState("Robust target batch indicates inconsistent chain state: AwaitingDownload"), id: 3386274545006561415, from: 12, to: 228064, end_root: 0x3847…3619, current_target: 14, batches: 10, peers: 15, validated_batches: 6, state: Syncing, sync_type: Finalized, service: sync

I've cherry-picked both 42592b5bfcc126782217a286dc25215cf02f16d8 and cd5e0bd17b885716fc4101c13a7bb6339915d0e8 onto current unstable (1ff4033830a21d076ea0c6b2720bef28f342e4e4) and this behaviour is not present, so I assume this is how I'm handling the guard in send_batch (i.e., c6580aa993d93a2776f97147e28416003ee206ee).

jmcph4 avatar Sep 11 '23 03:09 jmcph4

I've pushed this back to v5.1.0 since I don't think it's critical for the Deneb release. Please correct me if I'm wrong.

cc @jmcph4 and @AgeManning

paulhauner avatar Feb 15 '24 08:02 paulhauner

Yeah seems reasonable. When @jmcph4 is back, perhaps we can shift this out of draft and make it ready for review, before moving forward.

AgeManning avatar Feb 28 '24 23:02 AgeManning

Pushed this to 5.2 since we want to get the 5.1 release out by next week. Please change it back if needed

pawanjay176 avatar Mar 07 '24 11:03 pawanjay176

Dropping this from v5.2.0 as it doesn't seem to be ready

michaelsproul avatar Apr 30 '24 07:04 michaelsproul