[fulu] head sync to handle forky condition
Describe the bug
given this log, it shows that downloaded blocks are from one fork while downloaded sidecars are from another fork
Aug 14 08:19:19 devnet-ax41-1 beacon_run.sh[2827727]: Aug-14 08:19:19.191[sync] debug: ByRange requests beaconBlocksRequest={"start_slot":"32768","count":"32","step":"1"}, dataColumnRequest={"start_slot":"32768","count":"32","columns":["2","30","68","77","79","103","108","118"]}, allBlocks(8)=32771 32773 32780 32784 32786 32790 32797 32799, allDataColumnSidecars(160)=32768:2 32768:30 32768:68 32768:77 32768:79 32768:103 32768:108 32768:118 32769:2 32769:30 32769:68 32769:77 32769:79 32769:103 32769:108 32769:118 32772:2 32772:30 32772:68 32772:77 32772:79 32772:103 32772:108 32772:118 32774:2 32774:30 32774:68 32774:77 32774:79 32774:103 32774:108 32774:118 32775:2 32775:30 32775:68 32775:77 32775:79 32775:103 32775:108 32775:118 32776:2 32776:30 32776:68 32776:77 32776:79 32776:103 32776:108 32776:118 32777:2 32777:30 32777:68 32777:77 32777:79 32777:103 32777:108 32777:118 32778:2 32778:30 32778:68 32778:77 32778:79 32778:103 32778:108 32778:118 32779:2 32779:30 32779:68 32779:77 32779:79 32779:103 32779:108 32779:118 32782:2 32782:30 32782:68 32782:77 32782:79 32782:103 32782:108 32782:118 32783:2 32783:30 32783:68 32783:77 32783:79 32783:103 32783:108 32783:118 32785:2 32785:30 32785:68 32785:77 32785:79 32785:103 32785:108 32785:118 32787:2 32787:30 32787:68 32787:77 32787:79 32787:103 32787:108 32787:118 32788:2 32788:30 32788:68 32788:77 32788:79 32788:103 32788:108 32788:118 32789:2 32789:30 32789:68 32789:77 32789:79 32789:103 32789:108 32789:118 32792:2 32792:30 32792:68 32792:77 32792:79 32792:103 32792:108 32792:118 32793:2 32793:30 32793:68 32793:77 32793:79 32793:103 32793:108 32793:118 32794:2 32794:30 32794:68 32794:77 32794:79 32794:103 32794:108 32794:118 32795:2 32795:30 32795:68 32795:77 32795:79 32795:103 32795:108 32795:118 32798:2 32798:30 32798:68 32798:77 32798:79 32798:103 32798:108 32798:118, peerColumns=2 30 68 77 79 103 108 118, peerId=16Uiu2HAm8SYqc35vEVbCtVerzTJgxm3gGYrKAUYJx9pEjn5cScP2, peerClient=Lighthouse, prevPartialDownload=true
Aug 14 08:19:19 devnet-ax41-1 beacon_run.sh[2827727]: Aug-14 08:19:19.191[sync] debug: processing matchBlockWithDataColumns blobKzgCommitmentsLen=2, dataColumnSidecars=0, shouldHaveAllData=false, neededColumns=0 1 2 3 4 6 7 9 11 12 13 14 15 17 19 20 22 23 25 26 27 28 29 30 32 33 34 35 36 37 39 40 41 42 43 44 46 47 48 51 52 54 56 57 59 61 62 63 65 66 67 68 69 71 73 74 77 78 79 80 85 86 87 89 90 91 92 93 94 95 96 97 101 102 103 104 107 108 109 110 111 112 114 116 117 118 119 121 123 124 125 126 127, requestedColumns=2 30 68 77 79 103 108 118, slot=32771, dataColumnsSlots=, peerClient=Lighthouse
Aug 14 08:19:19 devnet-ax41-1 beacon_run.sh[2827727]: Aug-14 08:19:19.191[sync] debug: matchBlockWithDataColumns2 dataColumnIndexes=, requestedColumnsPresent=false, slot=32771, peerClient=Lighthouse
Aug 14 08:19:19 devnet-ax41-1 beacon_run.sh[2827727]: Aug-14 08:19:19.191[sync] debug: Missing or mismatching dataColumnSidecars from peerId=16Uiu2HAm8SYqc35vEVbCtVerzTJgxm3gGYrKAUYJx9pEjn5cScP2 for blockSlot=32771 with numColumns=128 dataColumnSidecars=0 requestedColumnsPresent=false received dataColumnIndexes= requested=2 30 68 77 79 103 108 118 allBlocks=8, allDataColumnSidecars=160, peerId=16Uiu2HAm8SYqc35vEVbCtVerzTJgxm3gGYrKAUYJx9pEjn5cScP2, blobKzgCommitmentsLen=2, peerClient=Lighthouse
Aug 14 08:19:19 devnet-ax41-1 beacon_run.sh[2827727]: Aug-14 08:19:19.191[sync] verbose: Batch download error id=Head-0, startEpoch=1024, status=Downloading, peer=16...5cScP2 - Missing or mismatching dataColumnSidecars from peerId=16Uiu2HAm8SYqc35vEVbCtVerzTJgxm3gGYrKAUYJx9pEjn5cScP2 for blockSlot=32771 blobKzgCommitmentsLen=2 with numColumns=128 dataColumnSidecars=0 requestedColumnsPresent=false received dataColumnIndexes= requested=2 30 68 77 79 103 108 118
Aug 14 08:19:19 devnet-ax41-1 beacon_run.sh[2827727]: Error: Missing or mismatching dataColumnSidecars from peerId=16Uiu2HAm8SYqc35vEVbCtVerzTJgxm3gGYrKAUYJx9pEjn5cScP2 for blockSlot=32771 blobKzgCommitmentsLen=2 with numColumns=128 dataColumnSidecars=0 requestedColumnsPresent=false received dataColumnIndexes= requested=2 30 68 77 79 103 108 118
Aug 14 08:19:19 devnet-ax41-1 beacon_run.sh[2827727]: at matchBlockWithDataColumns (file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/beaconBlocksMaybeBlobsByRange.ts:431:15)
Aug 14 08:19:19 devnet-ax41-1 beacon_run.sh[2827727]: at beaconBlocksMaybeBlobsByRange (file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/beaconBlocksMaybeBlobsByRange.ts:176:20)
Aug 14 08:19:19 devnet-ax41-1 beacon_run.sh[2827727]: at wrapError (file:///usr/src/lodestar/packages/beacon-node/src/util/wrapError.ts:18:32)
Aug 14 08:19:19 devnet-ax41-1 beacon_run.sh[2827727]: at SyncChain.sendBatch (file:///usr/src/lodestar/packages/beacon-node/src/sync/range/chain.ts
we treat each error thrown as "downloadError" and abort the range sync in the end then range sync created another chain and it keeps throwing error because for super node, we can't go with one round to download all data column sidecars
Expected behavior
in head sync, we should not treat mismatch error as download error the chain should keep trying to fetch from peers with compatible view (in the same branch) when it finish one fork, the chain is removed and replaced with another head sync of different fork
Steps to reproduce
No response
Additional context
No response
Operating system
Linux
Lodestar version or commit hash
unstable and fusaka-devnet-4
the current design of RangeSync post-fulu is for single fork only:
- for each
PartialDownloadwe maintain a single array ofBlockInput - then we cache all BlockInputs and some pending data columns, and if the next retry other peers return a different forks we throw "mismatch" error
but as a supernode, most of the time it has to go with multiple peers in order to get all columns, if one of them conflict then we'll throw mismatch error and in the end we abort the batch
This is what happened if I keep using the same batch: we stick to the initial blocks download and is it could be minority, and chain has to retry with different peers but all of them are on other forks. :
grep -e 'ByRange requests beaconBlocksRequest={"start_slot":"32768","count":"32","step":"1"}' -rn beacon-2025-08-14.log | wc -l
7437
it's worth to explore storing multiple forks per batch and never throw "mismatch" error (instead just add a new fork = new BlockInputs + pendingDataColumns)
then the batch is finished when one of forks is finish it will end up with the node going with one single/majority fork but at least it will prevent the sync from suspending like the current status