lighthouse icon indicating copy to clipboard operation
lighthouse copied to clipboard

Stuck lookup reports

Open dapplion opened this issue 1 year ago • 1 comments

Collection of investigations of Notify the devs, a sync lookup is stuck events:

ho-h-bn-toga - 0x9bf61d9f31473fc8a9a91891b9601f9c6a04450ebe55bbde0a083c60a0b36997

May 24 01:57:15.980 WARN Notify the devs, a sync lookup is stuck, ancestor_lookup: SingleBlockLookup { 
    id: 56, 
    block_request_state: BlockRequestState { 
        state: SingleLookupRequestState { 
            state: AwaitingDownload, failed_processing: 0, failed_downloading: 0 
            } 
        }, 
    blob_request_state: BlobRequestState { 
        state: SingleLookupRequestState { 
            state: AwaitingDownload, failed_processing: 0, failed_downloading: 0
        } 
    }, 
    peers: 7, 

Lookup 0x9bf61d9f31473fc8a9a91891b9601f9c6a04450ebe55bbde0a083c60a0b36997

^ this block has 1 blob

  • We had a child lookup with cached block (id = 55), and zero peers.
  • When parent is processed, child continues
  • The child block is sent for processing
  • The blob request fails because it has no peers, and the lookup is dropped
  • When the block processing result returns, it triggers and unknown lookup error
  • Immediately then we get "unknown block hash message" for same root, and create a new lookup with (id = 56)
May 24 01:41:01.108 DEBG Unknown parent for gossip block         root: 0x9bf61d9f31473fc8a9a91891b9601f9c6a04450ebe55bbde0a083c60a0b36997
May 24 01:41:01.108 DEBG Searching for components of a block with unknown parent, id: 55, block_root: 0x9bf61d9f31473fc8a9a91891b9601f9c6a04450ebe55bbde0a083c60a0b36997, peer_ids: [], service: lookup_sync, service: sync
..
May 24 01:41:01.518 DEBG Received unknown block hash message     peer: PeerId("16Uiu2HAmKdxFEg3UKJRQ1BUGQaFjU2TBpE7qcz3YibWqaqX1T6RH"), block_root: 0x9bf61d9f31473fc8a9a91891b9601f9c6a04450ebe55bbde0a083c60a0b36997, service: sync
May 24 01:41:01.518 DEBG Searching for block components          id: 56, block_root: 0x9bf61d9f31473fc8a9a91891b9601f9c6a04450ebe55bbde0a083c60a0b36997, peer_ids: [PeerId("16Uiu2HAmKdxFEg3UKJRQ1BUGQaFjU2TBpE7qcz3YibWqaqX1T6RH")], service: lookup_sync, service: sync
..
May 24 01:41:01.900 DEBG Beacon block awaiting blobs             block_slot: 1717705, block_root: 0x9bf61d9f31473fc8a9a91891b9601f9c6a04450ebe55bbde0a083c60a0b36997, service: beacon
May 24 01:41:01.900 DEBG Dropping lookup on request error        error: UnknownLookup, source: processing_result, id: 55, service: lookup_sync, service: sync

Stuck cause:

The new lookup 56 is created while the block is in the processing cache, block_lookup_request returns Pending. However, the block is processing from a previous RPC lookup. The event that is returned from processing gets ignored because it's for a previous ID.

Fix

TBD

dapplion avatar May 24 '24 13:05 dapplion

ho-h-bn-toga - 0x4449a0eb35516c34a10fe21ded0467131674b4ea5dad86465305ba32e3ca0baf

May 24 04:00:45.978 WARN Notify the devs, a sync lookup is stuck, lookup: SingleBlockLookup { 
    id: 403, 
    block_request_state: BlockRequestState { 
        state: SingleLookupRequestState { 
            state: AwaitingDownload, failed_processing: 0, failed_downloading: 0
        }
    },
    blob_request_state: BlobRequestState { 
        state: SingleLookupRequestState {
            state: AwaitingDownload, failed_processing: 0, failed_downloading: 0
        }
    }, 
    peers: 1,
    block_root: 0x4449a0eb35516c34a10fe21ded0467131674b4ea5dad86465305ba32e3ca0baf,
    awaiting_parent: None

This block has 2 blobs

  • We received and verified a gossip block (insert into da_checker)
  • We received blobs from gossip
  • Then, receive an attestation that triggers a lookup for the block. The request does not progress as block_lookup_request returns pending because block is in da_checker.
  • When blob processing completes, block is imported but no event is sent to sync causing the lookup to be stuck. GossipBlockProcessResult is only sent at the end of block gossip import, but blob gossip can also remove the block from the da_checker.
May 24 03:44:37.566 DEBG Successfully verified gossip block      commitments: [0xa9da…aada, 0xb3d2…6dd3], root: 0x4449a0eb35516c34a10fe21ded0467131674b4ea5dad86465305ba32e3ca0baf, slot: 1718323, graffiti: int-hol-vouch-test-03, service: beacon
May 24 03:44:37.713 DEBG Beacon block awaiting blobs             block_slot: 1718323, block_root: 0x4449a0eb35516c34a10fe21ded0467131674b4ea5dad86465305ba32e3ca0baf, service: beacon
May 24 03:44:37.927 DEBG Successfully verified gossip blob       commitment: 0xa9da…aada, index: 0, root: 0x4449…0baf, slot: 1718323
May 24 03:44:37.931 DEBG Successfully verified gossip blob       commitment: 0xb3d2…6dd3, index: 1, root: 0x4449…0baf, slot: 1718323
May 24 03:44:37.934 DEBG Received unknown block hash message     peer: PeerId("16Uiu2HAm1Ng1VETpeb6iu2A31PcrnSV1oh6GyzVLaLDeWp9R6h9J"), block_root: 0x4449a0eb35516c34a10fe21ded0467131674b4ea5dad86465305ba32e3ca0baf, service: sync
May 24 03:44:37.934 DEBG Searching for block components          id: 403, block_root: 0x4449a0eb35516c34a10fe21ded0467131674b4ea5dad86465305ba32e3ca0baf, peer_ids: [PeerId("16Uiu2HAm1Ng1VETpeb6iu2A31PcrnSV1oh6GyzVLaLDeWp9R6h9J")], service: lookup_sync, service: sync
May 24 03:44:38.111 DEBG Writing blobs to store                  count: 2, block_root: 0x4449…0baf, service: beacon
May 24 03:44:38.112 INFO Gossipsub blob processed, imported fully available block, block_root: 0x4449…0baf

Stuck cause

GossipBlockProcessResult should be sent both from gossip block import and gossip blob import

Fix

TBD


Same status and root cause for this other incidents:

  • ho-h-bn-toga - 0x53c24ca2fdc60a30e6f567f2ef82b2622df0e34497dc282df1c55061f867e9eb
  • ho-h-bn-toga - 0x50686dd264241032e5d0512e7990df980ef1587fc38ec70176c3d72897c5a892
  • ho-h-bn-toga - 0x90a943df3c898961bd7780e75b38e2942289032f4801b5f6840c588b4be88355
  • ho-h-bn-toga - 0x28b2050145adac935f106401a0ba10ec09c84f760b1bd27213b2e7e74705911b

dapplion avatar May 24 '24 13:05 dapplion

A specific subcase documented here:

  • https://github.com/sigp/lighthouse/issues/6703

michaelsproul avatar Dec 16 '24 06:12 michaelsproul