Update syncing logic to fix duplicate block requests
Motivation
As issue #3404 described, validators send out duplicate block requests when syncing. This PR updates the syncing logic to make sure we only remove a block response after it was added to the ledger or if we encountered an error.
For background, currently, syncing works as follows: BlockSync requests the blocks, and stores it in the internal responses map. Sync calls block_sync.process_next_block(current_height) to get the next block to process. This removes it from BlockSync’s responses and returns it to Sync. Sync performs the checks, and if successful, adds the block to the ledger. This also updates BlockSync’s canon.
The issue is that BlockSync‘s view of the canon is only updated after Sync is done processing blocks (and adding to the ledger). There’s a window where BlockSync is unaware of blocks that are pending validation in Sync. Thus, it re-requests them.
This PR adjusts the block processing interface to the Sync module as follows:
-
process_next_blockREADS a block from a particular height -
remove_block_responseREMOVES a block from a particular height, which MUST be called after advancing completed or failed
Test Plan
Ran it locally and verified it removes the duplicate block request.
The following figure shows a syncing process before the fix, with a longer pause until a duplicate request is generated:
The following figure shows the syncing process with the fix (note there is no double request and no delay, as such, the syncing is much faster):
Closes #3404