celo-blockchain icon indicating copy to clipboard operation
celo-blockchain copied to clipboard

Full node on mainnet stop syncing with BAD BLOCK

Open kwunyeung opened this issue 4 years ago • 5 comments

The fullnode stopped syncing at block number 1952640 with the error message below.

Logs

Are there any logs?

ERROR[08-17|09:38:21.814] The header retrieved from the chain is nil block num=1952640
ERROR[08-17|09:38:21.814] 
########## BAD BLOCK #########
Chain config: {ChainID: 42220 Homestead: 0 DAO: <nil> DAOSupport: false EIP150: 0 EIP155: 0 EIP158: 0 Byzantium: 0 Constantinople: 0 Petersburg: 0 Istanbul: 0 Engine: istanbul}

Number: 1966203
Hash: 0xc778e99dfd1dee0a97df2c5e68a3a933d5a1ae2022b1e3953290060e8b8425b0


Error: unknown block
##############################

System information

Run geth version

Celo
Version: 1.0.1-stable
Architecture: amd64
Protocol Versions: [65 64]
Go Version: go1.13.14
Operating System: linux
GOPATH=
GOROOT=/usr/local/go

kwunyeung avatar Aug 17 '20 09:08 kwunyeung

Hi @kwunyeung. Sorry for taking so long to address this issue. Are you still seeing it, even after deleting your on-disk chain data?

I think that the root cause of this issue is this: https://github.com/celo-org/celo-blockchain/issues/1107

What's happening is that on the last block of each epoch block (specifically for block numbers where mod 17280 == 0), it will calculate validator awards for all of the validators within the ending epoch.

As part of that calculation, you node will need to calculate all of the epoch's validators uptime scores, which use data saved on your nodes' local leveldb. However, there are rare corner cases that the data in the leveldb is incorrect, which leads to your node calculating incorrect uptime scores (leading to this BAD BLOCK error).

I'm going to close this for now. If you encounter it again even after deleting your chain data and resyncing, then please re-open.

kevjue avatar Dec 15 '20 22:12 kevjue

@kevjue we have synced the node from scratch and it's currently working fine. Thanks!

kwunyeung avatar Dec 22 '20 09:12 kwunyeung

I saw this now on a full node (on a recent version syncing to mainnet):

ERROR[03-23|19:23:33.610] The header retrieved from the chain is nil block num=1002240
ERROR[03-23|19:23:33.610]
########## BAD BLOCK #########
Chain config: {ChainID: 42220 Homestead: 0 DAO: <nil> DAOSupport: true EIP150: 0 EIP155: 0 EIP158: 0 Byzantium: 0 Constantinople: 0 Petersburg: 0 Istanbul: 0 Churrito: <nil>, Donut: <nil>, Engine: istanbul}

Number: 1004268
Hash: 0x10b43d1b808c0f1e1be98f9f36b84bb591e07051f0341cc7af30d9cb87a9b6d7


Error: unknown block
##############################

The node is stuck on block 978688, and reports an error on block 1004268. Not clear why it got stuck syncing, though.

Presumably deleting the chain data would work, but then it'd have to sync from the beginning again, and it seems the issue is not really resolved.

oneeman avatar Mar 23 '21 19:03 oneeman

I just hit this on mainnet, latest block in Geth is 2333423

ERROR[04-29|12:19:53.531] The header retrieved from the chain is nil block num=2350080
ERROR[04-29|12:19:53.531]
########## BAD BLOCK #########
Chain config: {ChainID: 42220 Homestead: 0 DAO: <nil> DAOSupport: true EIP150: 0 EIP155: 0 EIP158: 0 Byzantium: 0 Constantinople: 0 Petersburg: 0 Istanbul: 0 Churrito: 6774000, Donut: 6774000, Engine: istanbul}
Number: 3360065
Hash: 0x058bcdb82cef7af601306c1a6ef2ce4139beb85ba4332e6308b65e3db15ec8fd
Error: unknown block
##############################
WARN [04-29|12:19:53.535] Error in sending message                 func=AsyncMulticastCeloMsg msgCode=18 peer="Peer 7a2c1573a9b944c0 [eth/65]" ethMsgCode=18 err="shutting down"
WARN [04-29|12:19:53.539] Synchronisation failed, dropping peer    peer=167d06a00d57b861                 err="retrieved hash chain is invalid: unknown block"

[Edit: the node was on v1.0.0-stable]

martinvol avatar Apr 29 '21 13:04 martinvol

@trianglesphere @gastonponti check if this happens during new release version. If not, close

mcortesi avatar Nov 27 '21 13:11 mcortesi