lighthouse
lighthouse copied to clipboard
Stuck lookup reports
Collection of investigations of Notify the devs, a sync lookup is stuck events:
ho-h-bn-toga - 0x9bf61d9f31473fc8a9a91891b9601f9c6a04450ebe55bbde0a083c60a0b36997
May 24 01:57:15.980 WARN Notify the devs, a sync lookup is stuck, ancestor_lookup: SingleBlockLookup {
id: 56,
block_request_state: BlockRequestState {
state: SingleLookupRequestState {
state: AwaitingDownload, failed_processing: 0, failed_downloading: 0
}
},
blob_request_state: BlobRequestState {
state: SingleLookupRequestState {
state: AwaitingDownload, failed_processing: 0, failed_downloading: 0
}
},
peers: 7,
Lookup 0x9bf61d9f31473fc8a9a91891b9601f9c6a04450ebe55bbde0a083c60a0b36997
^ this block has 1 blob
- We had a child lookup with cached block (id = 55), and zero peers.
- When parent is processed, child continues
- The child block is sent for processing
- The blob request fails because it has no peers, and the lookup is dropped
- When the block processing result returns, it triggers and unknown lookup error
- Immediately then we get "unknown block hash message" for same root, and create a new lookup with (id = 56)
May 24 01:41:01.108 DEBG Unknown parent for gossip block root: 0x9bf61d9f31473fc8a9a91891b9601f9c6a04450ebe55bbde0a083c60a0b36997
May 24 01:41:01.108 DEBG Searching for components of a block with unknown parent, id: 55, block_root: 0x9bf61d9f31473fc8a9a91891b9601f9c6a04450ebe55bbde0a083c60a0b36997, peer_ids: [], service: lookup_sync, service: sync
..
May 24 01:41:01.518 DEBG Received unknown block hash message peer: PeerId("16Uiu2HAmKdxFEg3UKJRQ1BUGQaFjU2TBpE7qcz3YibWqaqX1T6RH"), block_root: 0x9bf61d9f31473fc8a9a91891b9601f9c6a04450ebe55bbde0a083c60a0b36997, service: sync
May 24 01:41:01.518 DEBG Searching for block components id: 56, block_root: 0x9bf61d9f31473fc8a9a91891b9601f9c6a04450ebe55bbde0a083c60a0b36997, peer_ids: [PeerId("16Uiu2HAmKdxFEg3UKJRQ1BUGQaFjU2TBpE7qcz3YibWqaqX1T6RH")], service: lookup_sync, service: sync
..
May 24 01:41:01.900 DEBG Beacon block awaiting blobs block_slot: 1717705, block_root: 0x9bf61d9f31473fc8a9a91891b9601f9c6a04450ebe55bbde0a083c60a0b36997, service: beacon
May 24 01:41:01.900 DEBG Dropping lookup on request error error: UnknownLookup, source: processing_result, id: 55, service: lookup_sync, service: sync
Stuck cause:
The new lookup 56 is created while the block is in the processing cache, block_lookup_request returns Pending. However, the block is processing from a previous RPC lookup. The event that is returned from processing gets ignored because it's for a previous ID.
Fix
TBD
ho-h-bn-toga - 0x4449a0eb35516c34a10fe21ded0467131674b4ea5dad86465305ba32e3ca0baf
May 24 04:00:45.978 WARN Notify the devs, a sync lookup is stuck, lookup: SingleBlockLookup {
id: 403,
block_request_state: BlockRequestState {
state: SingleLookupRequestState {
state: AwaitingDownload, failed_processing: 0, failed_downloading: 0
}
},
blob_request_state: BlobRequestState {
state: SingleLookupRequestState {
state: AwaitingDownload, failed_processing: 0, failed_downloading: 0
}
},
peers: 1,
block_root: 0x4449a0eb35516c34a10fe21ded0467131674b4ea5dad86465305ba32e3ca0baf,
awaiting_parent: None
This block has 2 blobs
- We received and verified a gossip block (insert into da_checker)
- We received blobs from gossip
- Then, receive an attestation that triggers a lookup for the block. The request does not progress as
block_lookup_requestreturns pending because block is in da_checker. - When blob processing completes, block is imported but no event is sent to sync causing the lookup to be stuck.
GossipBlockProcessResultis only sent at the end of block gossip import, but blob gossip can also remove the block from the da_checker.
May 24 03:44:37.566 DEBG Successfully verified gossip block commitments: [0xa9da…aada, 0xb3d2…6dd3], root: 0x4449a0eb35516c34a10fe21ded0467131674b4ea5dad86465305ba32e3ca0baf, slot: 1718323, graffiti: int-hol-vouch-test-03, service: beacon
May 24 03:44:37.713 DEBG Beacon block awaiting blobs block_slot: 1718323, block_root: 0x4449a0eb35516c34a10fe21ded0467131674b4ea5dad86465305ba32e3ca0baf, service: beacon
May 24 03:44:37.927 DEBG Successfully verified gossip blob commitment: 0xa9da…aada, index: 0, root: 0x4449…0baf, slot: 1718323
May 24 03:44:37.931 DEBG Successfully verified gossip blob commitment: 0xb3d2…6dd3, index: 1, root: 0x4449…0baf, slot: 1718323
May 24 03:44:37.934 DEBG Received unknown block hash message peer: PeerId("16Uiu2HAm1Ng1VETpeb6iu2A31PcrnSV1oh6GyzVLaLDeWp9R6h9J"), block_root: 0x4449a0eb35516c34a10fe21ded0467131674b4ea5dad86465305ba32e3ca0baf, service: sync
May 24 03:44:37.934 DEBG Searching for block components id: 403, block_root: 0x4449a0eb35516c34a10fe21ded0467131674b4ea5dad86465305ba32e3ca0baf, peer_ids: [PeerId("16Uiu2HAm1Ng1VETpeb6iu2A31PcrnSV1oh6GyzVLaLDeWp9R6h9J")], service: lookup_sync, service: sync
May 24 03:44:38.111 DEBG Writing blobs to store count: 2, block_root: 0x4449…0baf, service: beacon
May 24 03:44:38.112 INFO Gossipsub blob processed, imported fully available block, block_root: 0x4449…0baf
Stuck cause
GossipBlockProcessResult should be sent both from gossip block import and gossip blob import
Fix
TBD
Same status and root cause for this other incidents:
- ho-h-bn-toga - 0x53c24ca2fdc60a30e6f567f2ef82b2622df0e34497dc282df1c55061f867e9eb
- ho-h-bn-toga - 0x50686dd264241032e5d0512e7990df980ef1587fc38ec70176c3d72897c5a892
- ho-h-bn-toga - 0x90a943df3c898961bd7780e75b38e2942289032f4801b5f6840c588b4be88355
- ho-h-bn-toga - 0x28b2050145adac935f106401a0ba10ec09c84f760b1bd27213b2e7e74705911b
A specific subcase documented here:
- https://github.com/sigp/lighthouse/issues/6703