lighthouse icon indicating copy to clipboard operation
lighthouse copied to clipboard

Transition block lookup sync to range sync

Open dapplion opened this issue 1 year ago • 1 comments
trafficstars

Issue Addressed

For context of the issue 👇

  • https://github.com/sigp/lighthouse/issues/6099

Closes https://github.com/sigp/lighthouse/issues/6099

Proposed Changes

This solution links range sync and lookup sync by:

  • Trigger lookup sync with status messages that include unknown root. Currently range sync "swallows" peer statuses that are too close to head but on a potentially unknown fork. Range sync is not suitable to handle short forks, so instead deffer to lookup sync to retrieve that unknown root.
  • If lookup sync reaches the max chain length, trigger range sync on that set of peers. The idea is that most peers in that group should have this chain as canonical so we can blocks_by_range from them. In the future we may use https://github.com/ethereum/consensus-specs/pull/3845 to be sure what fork are we syncing from.

I believe this change is the minimal solution to cover the corner case of #6099. It's not maximally efficient as it will download a section of blocks twice. However this situation is rare, and we just need some way to recover even if it's at the expense of some extra bandwidth.

Testing

There are plans to with EF devops to test nasty long non-finality situations, but not now. For now I can try to write representative unit sync tests with some sample scenarios. This change should not impact good network codepaths, so should have no downsides. Range sync will only be triggered when we need to discover a long unknown fork, which has only happened in LH in cooked testnets of lookup sync bugs.

dapplion avatar Jul 17 '24 09:07 dapplion

Updated tests to assert a syncing chain is created. Plus removed down-scoring the peers for a long chain

https://github.com/sigp/lighthouse/blob/4ec5594f4bf507b54542cf9a6b2943b322c90460/beacon_node/network/src/sync/block_lookups/mod.rs#L294-L299

As a final item we could check that for a syncing chain if it reaches the end, assert that the block root matches that syncing root

dapplion avatar Aug 01 '24 14:08 dapplion

@mergify queue

realbigsean avatar Oct 08 '24 18:10 realbigsean

queue

🛑 The pull request has been removed from the queue default

The merge conditions cannot be satisfied due to failing checks.

You can take a look at Queue: Embarked in merge queue check runs for more details.

In case of a failure due to a flaky test, you should first retrigger the CI. Then, re-embark the pull request into the merge queue by posting the comment @mergifyio refresh on the pull request.

mergify[bot] avatar Oct 08 '24 18:10 mergify[bot]

@mergify refresh

realbigsean avatar Oct 08 '24 21:10 realbigsean

refresh

✅ Pull request refreshed

mergify[bot] avatar Oct 08 '24 21:10 mergify[bot]

@mergify queue

realbigsean avatar Oct 08 '24 21:10 realbigsean

queue

✅ The pull request has been merged automatically

The pull request has been merged automatically at 71c5388461df8e48bafb2e1ecd732eb29957ce03

mergify[bot] avatar Oct 08 '24 21:10 mergify[bot]