cronos
cronos copied to clipboard
Problem: tendermint tx indexer could miss block when restart
This is an older block at the time of query (119510)
[root@bprod-cronos-1 ~]# curl localhost:8545 -X POST -H "Content-Type: application/json" -d '{"jsonrpc":"2.0","method":"eth_getBlockByNumber","params": ["0x1D2D6", true],"id":1}' -s | jq
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"difficulty": "0x0",
"extraData": "0x",
"gasLimit": "0xffffffff",
"gasUsed": "0x24d2b8",
"hash": "0x4324b6e0116d22f6ce615f227276f8f974659c237f71acbd73d02981e225a05c",
"logsBloom": "0x002000000200000400080000800200020000000c340800001020000080200310c00050008000010000200000000004020840000040e00000200000040021400200201400008200801040000c00001020000002000241000082010040800020000810040002200000000001001080080000020000000816002400401000000000812002000000c004000000000102091000000801d0006008100000408000000006000802101200010020044000002000800602000108000012220020000000800000000200028000880001000102200000020008c010101400002242020020000010110000801000000000000001000000004000400000400800004000002000",
"miner": "0x81e3e543647e466a5abc824f5844ab0a091b6c6c",
"mixHash": "0x0000000000000000000000000000000000000000000000000000000000000000",
"nonce": "0x0000000000000000",
"number": "0x1d2d6",
"parentHash": "0xad2432f13c825d3a9163c9e6d518f08633b00ad6a1eb87c747ad7f2427dabe90",
"receiptsRoot": "0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421",
"sha3Uncles": "0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347",
"size": "0x909c",
"stateRoot": "0xea8aae6836bdc465215558bc1c7511717069c914eaae48e984af1415b6564a42",
"timestamp": "0x6192aef2",
"totalDifficulty": "0x0",
"transactions": [],
"transactionsRoot": "0x8b2974a39d3aeed1cbe3aa1654ab933a9bcb4d7215b411b8a6cf569c1783abde",
"uncles": []
}
}
^ compared to expected responses transactions array being empty now
the node has newer blocks and keep ingest new blocks thoggh
{
"blockNumber": 125710,
"blockHash": "0x887475723ba801bb1b2ce68709b74941223652f09317127aaf15a4d49bfeb85e",
"blockTime": "2021-11-16T04:47:34.000Z",
"checkTime": "2021-11-16T04:47:50.836Z",
"timeDiff": 16836
}
To Reproduce It occurs intermittently only in some of the node x86_64
noticed this happened a few times yesterday as well, we had to manually resync the block on subgraph whenever this happen
could be an error on tendermint
@JayT106 is it possible that either e.clientCtx.Client.Block(e.ctx, &height)
or e.clientCtx.Client.TxSearch
return different value per nodes?
could be an error on tendermint @JayT106 is it possible that either
e.clientCtx.Client.Block(e.ctx, &height)
ore.clientCtx.Client.TxSearch
return different value per nodes?
First, the block data should be the same across the network nodes, otherwise, the consensus should break. We might debug EthBlockFromTendermint
to see why the block data transform fail.
Second. the TxSearch
searches the transaction in the indexer, so the node must have the KV
indexer enabled. And the indexer might fail to index the transaction and block data when something wrong during indexing (when a new block is committed). So it's possible to have a different result from the different nodes.
could be an error on tendermint @JayT106 is it possible that either
e.clientCtx.Client.Block(e.ctx, &height)
ore.clientCtx.Client.TxSearch
return different value per nodes?First, the block data should be the same across the network nodes, otherwise, the consensus should break. We might debug
EthBlockFromTendermint
to see why the block data transform fail.Second. the
TxSearch
searches the transaction in the indexer, so the node must have theKV
indexer enabled. And the indexer might fail to index the transaction and block data when something wrong during indexing (when a new block is committed). So it's possible to have a different result from the different nodes.
Since the tendermint tx indexer service runs asynchronously with block commit, so it could lag behind.
And eth_getBlockByNumber
rpc API rely on /tx_search
to find the tx, it's possible that it can't find it, especially for recent blocks.
It reproduced in one of our nodes where tendermint tx indexer fails to index a whole block, and won't recover automatically, matching what's described by OP.
https://github.com/tendermint/tendermint/issues/7312 We found that when restart node, there's chance that the tx indexer could miss a block.
I tested it has been fixed in this PR. Not sure will it be backported to v0.34 https://github.com/tendermint/tendermint/issues/7312#issuecomment-978056789