nimbus-eth1 icon indicating copy to clipboard operation
nimbus-eth1 copied to clipboard

Sync: Rapidly find and track peer canonical heads

Open jlokier opened this issue 2 years ago • 1 comments

First component of new sync approach.

This module fetches and tracks the canonical chain head of each connected peer. (Or in future, each peer we care about; we won't poll them all so often.)

This is for when we aren't sure of the block number of a peer's canonical chain head. Most of the time, after finding which block, it quietly polls to track small updates to the "best" block number and hash of each peer.

But sometimes that can get out of step. If there has been a deeper reorg than our tracking window, or a burst of more than a few new blocks, network delays, downtime, or the peer is itself syncing. Perhaps we stopped Nimbus and restarted a while later, e.g. suspending a laptop or Control-Z. Then this will catch up. It is even possible that the best hash the peer gave us in the Status handshake has disappeared by the time we query for the corresponding block number, so we start at zero.

The steps here perform a robust and efficient O(log N) search to rapidly converge on the new best block if it's moved out of the polling window no matter where it starts, confirm the peer's canonical chain head boundary, then track the peer's chain head in real-time by polling. The method is robust to peer state changes at any time.

The purpose is to:

  • Help with finding a peer common chain prefix ("fast sync pivot") in a consistent, fast and explicit way.

  • Catch up quickly after any long pauses of network downtime, program not running, or deep chain reorgs.

  • Be able to display real-time peer states, so they are less mysterious.

  • Tell the beam/snap/trie sync processes when to start and what blocks to fetch, and keep those fetchers in the head-adjacent window of the ever-changing chain.

  • Help the sync process bootstrap usefully when we only have one peer, speculatively fetching and validating what data we can before we have more peers to corroborate the consensus.

  • Help detect consensus failures in the network.

We cannot assume a peer's canonical chain stays the same or only gains new blocks from one query to the next. There can be reorgs, including deep reorgs. When a reorg happens, the best block number can decrease if the new canonical chain is shorter than the old one, and the best block hash we previously knew can become unavailable on the peer. So we must detect when the current best block disappears and be able to reduce block number.

Also:

Add --newsync option and use it. This option enables new blockchain sync and real-time consensus algorithms that will eventually replace the old, very limited sync.

New sync is work in progress. It's included as an option rather than a code branch, because it's more useful for testing this way, and must not conflict anyway. It's off by default. Eventually this will become enabled by default and the option will be removed.

jlokier avatar Aug 24 '21 14:08 jlokier

This module fetches and tracks the canonical chain head of each connected peer. (Or in future, each peer we care about; we won't poll them all so often.)

This is for when we aren't sure of the block number of a peer's canonical chain head. Most of the time, after finding which block, it quietly polls to track small updates to the "best" block number and hash of each peer.

Just curious, is this polling of each peer really necessary ? Usually, each peer after sucessfull proof of work validation propagates NewBlock message to square root of its peers, and after each import NewBlockHashes message to all its peers, wouldn't tracking peer head based on those messages be enough ?

KonradStaniec avatar Aug 25 '21 11:08 KonradStaniec