cronos Problem: tendermint tx indexer could miss block when restart

This is an older block at the time of query (119510)

[root@bprod-cronos-1 ~]# curl localhost:8545 -X POST -H "Content-Type: application/json" -d '{"jsonrpc":"2.0","method":"eth_getBlockByNumber","params": ["0x1D2D6", true],"id":1}' -s | jq

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "difficulty": "0x0",
    "extraData": "0x",
    "gasLimit": "0xffffffff",
    "gasUsed": "0x24d2b8",
    "hash": "0x4324b6e0116d22f6ce615f227276f8f974659c237f71acbd73d02981e225a05c",
    "logsBloom": "0x002000000200000400080000800200020000000c340800001020000080200310c00050008000010000200000000004020840000040e00000200000040021400200201400008200801040000c00001020000002000241000082010040800020000810040002200000000001001080080000020000000816002400401000000000812002000000c004000000000102091000000801d0006008100000408000000006000802101200010020044000002000800602000108000012220020000000800000000200028000880001000102200000020008c010101400002242020020000010110000801000000000000001000000004000400000400800004000002000",
    "miner": "0x81e3e543647e466a5abc824f5844ab0a091b6c6c",
    "mixHash": "0x0000000000000000000000000000000000000000000000000000000000000000",
    "nonce": "0x0000000000000000",
    "number": "0x1d2d6",
    "parentHash": "0xad2432f13c825d3a9163c9e6d518f08633b00ad6a1eb87c747ad7f2427dabe90",
    "receiptsRoot": "0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421",
    "sha3Uncles": "0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347",
    "size": "0x909c",
    "stateRoot": "0xea8aae6836bdc465215558bc1c7511717069c914eaae48e984af1415b6564a42",
    "timestamp": "0x6192aef2",
    "totalDifficulty": "0x0",
    "transactions": [],
    "transactionsRoot": "0x8b2974a39d3aeed1cbe3aa1654ab933a9bcb4d7215b411b8a6cf569c1783abde",
    "uncles": []
  }
}
^ compared to expected responses transactions array being empty now 

the node has newer blocks and keep ingest new blocks thoggh

{
  "blockNumber": 125710,
  "blockHash": "0x887475723ba801bb1b2ce68709b74941223652f09317127aaf15a4d49bfeb85e",
  "blockTime": "2021-11-16T04:47:34.000Z",
  "checkTime": "2021-11-16T04:47:50.836Z",
  "timeDiff": 16836
}

To Reproduce It occurs intermittently only in some of the node x86_64

Nov 16 '21 05:11 jun0tpyrc

noticed this happened a few times yesterday as well, we had to manually resync the block on subgraph whenever this happen

Nov 18 '21 03:11 crypto-steve-ng

could be an error on tendermint @JayT106 is it possible that either e.clientCtx.Client.Block(e.ctx, &height) or e.clientCtx.Client.TxSearch return different value per nodes?

Nov 18 '21 03:11 thomas-nguy

could be an error on tendermint @JayT106 is it possible that either e.clientCtx.Client.Block(e.ctx, &height) or e.clientCtx.Client.TxSearch return different value per nodes?

First, the block data should be the same across the network nodes, otherwise, the consensus should break. We might debug EthBlockFromTendermint to see why the block data transform fail.

Second. the TxSearch searches the transaction in the indexer, so the node must have the KV indexer enabled. And the indexer might fail to index the transaction and block data when something wrong during indexing (when a new block is committed). So it's possible to have a different result from the different nodes.

Nov 18 '21 15:11 JayT106

could be an error on tendermint @JayT106 is it possible that either e.clientCtx.Client.Block(e.ctx, &height) or e.clientCtx.Client.TxSearch return different value per nodes?

First, the block data should be the same across the network nodes, otherwise, the consensus should break. We might debug EthBlockFromTendermint to see why the block data transform fail.

Second. the TxSearch searches the transaction in the indexer, so the node must have the KV indexer enabled. And the indexer might fail to index the transaction and block data when something wrong during indexing (when a new block is committed). So it's possible to have a different result from the different nodes.

Since the tendermint tx indexer service runs asynchronously with block commit, so it could lag behind. And eth_getBlockByNumber rpc API rely on /tx_search to find the tx, it's possible that it can't find it, especially for recent blocks.

Nov 24 '21 02:11 yihuang

It reproduced in one of our nodes where tendermint tx indexer fails to index a whole block, and won't recover automatically, matching what's described by OP.

Nov 24 '21 04:11 yihuang

https://github.com/tendermint/tendermint/issues/7312 We found that when restart node, there's chance that the tx indexer could miss a block.

Nov 24 '21 06:11 yihuang

I tested it has been fixed in this PR. Not sure will it be backported to v0.34 https://github.com/tendermint/tendermint/issues/7312#issuecomment-978056789

Nov 24 '21 16:11 JayT106

cronos cronos copied to clipboard

Problem: tendermint tx indexer could miss block when restart

cronos
cronos copied to clipboard