lighthouse
lighthouse copied to clipboard
Transition block lookup sync to range sync
Issue Addressed
For context of the issue 👇
- https://github.com/sigp/lighthouse/issues/6099
Closes https://github.com/sigp/lighthouse/issues/6099
Proposed Changes
This solution links range sync and lookup sync by:
- Trigger lookup sync with status messages that include unknown root. Currently range sync "swallows" peer statuses that are too close to head but on a potentially unknown fork. Range sync is not suitable to handle short forks, so instead deffer to lookup sync to retrieve that unknown root.
- If lookup sync reaches the max chain length, trigger range sync on that set of peers. The idea is that most peers in that group should have this chain as canonical so we can
blocks_by_rangefrom them. In the future we may use https://github.com/ethereum/consensus-specs/pull/3845 to be sure what fork are we syncing from.
I believe this change is the minimal solution to cover the corner case of #6099. It's not maximally efficient as it will download a section of blocks twice. However this situation is rare, and we just need some way to recover even if it's at the expense of some extra bandwidth.
Testing
There are plans to with EF devops to test nasty long non-finality situations, but not now. For now I can try to write representative unit sync tests with some sample scenarios. This change should not impact good network codepaths, so should have no downsides. Range sync will only be triggered when we need to discover a long unknown fork, which has only happened in LH in cooked testnets of lookup sync bugs.
Updated tests to assert a syncing chain is created. Plus removed down-scoring the peers for a long chain
https://github.com/sigp/lighthouse/blob/4ec5594f4bf507b54542cf9a6b2943b322c90460/beacon_node/network/src/sync/block_lookups/mod.rs#L294-L299
As a final item we could check that for a syncing chain if it reaches the end, assert that the block root matches that syncing root
@mergify queue
queue
🛑 The pull request has been removed from the queue default
The merge conditions cannot be satisfied due to failing checks.
You can take a look at Queue: Embarked in merge queue check runs for more details.
In case of a failure due to a flaky test, you should first retrigger the CI.
Then, re-embark the pull request into the merge queue by posting the comment
@mergifyio refresh on the pull request.
@mergify refresh
refresh
✅ Pull request refreshed
@mergify queue
queue
✅ The pull request has been merged automatically
The pull request has been merged automatically at 71c5388461df8e48bafb2e1ecd732eb29957ce03