lighthouse icon indicating copy to clipboard operation
lighthouse copied to clipboard

Fallback node out of sync briefly during block production due to race condition

Open jimmygchen opened this issue 6 months ago • 0 comments

Description

Our basic simulator tests have been flaky and I think it may be caused by a race condition.

i think whats happening now is, we have validator 0 proposing a block, broadcasting to node 0 and 1:

  • node 0 publishes the block
  • node 1 receives it via gossip, and also receives it via http (fallback node). However this block doesn't get imported via either path, and the node would get out of sync for one slot until a block lookup is triggerd.

The race condition prevents the block import and return RepeatBlob in both code paths when gossip verifying the blobs.

In the simulator test setup (1 validator with 2 beacon nodes), it impacts its sync committee performance briefly and causes the simulator test to fail, as the fallback node is out of sync briefly.

This will likely be addressed in @michaelsproul's PR here, see the discussion thread here for more details and the investigation https://github.com/sigp/lighthouse/pull/5574#discussion_r1704944723

jimmygchen avatar Aug 06 '24 07:08 jimmygchen