lighthouse
lighthouse copied to clipboard
Fallback node out of sync briefly during block production due to race condition
Description
Our basic simulator tests have been flaky and I think it may be caused by a race condition.
i think whats happening now is, we have validator 0 proposing a block, broadcasting to node 0 and 1:
- node 0 publishes the block
- node 1 receives it via gossip, and also receives it via http (fallback node). However this block doesn't get imported via either path, and the node would get out of sync for one slot until a block lookup is triggerd.
The race condition prevents the block import and return RepeatBlob
in both code paths when gossip verifying the blobs.
In the simulator test setup (1 validator with 2 beacon nodes), it impacts its sync committee performance briefly and causes the simulator test to fail, as the fallback node is out of sync briefly.
This will likely be addressed in @michaelsproul's PR here, see the discussion thread here for more details and the investigation https://github.com/sigp/lighthouse/pull/5574#discussion_r1704944723