erigon icon indicating copy to clipboard operation
erigon copied to clipboard

Erigon unwind failed wrong trie root and stage sync hang

Open zeech3 opened this issue 3 years ago • 21 comments

System information

Erigon version: ./erigon --version

[ erigon]# ./build/bin/erigon --version
erigon version 2022.99.99-dev-cef61090

OS & Version: Windows/Linux/OSX

linux

Commit hash :

 46ef7e5305b382c5db6387bcbed1e34f544c1938

Expected behaviour

Actual behaviour

[EROR] [08-19|04:01:12.869] Staged Sync                              err="[9/16 IntermediateHashes] wrong trie root: c7d3c168bb0fe9acb6e5f7e79386f0a253f3f82f64380789f1665bb850
3ba751, expected (from header): 196ecc29fb862092498125760993a66d3abaedef6591171dd7ff77f3aad80e87"

Steps to reproduce the behaviour

Backtrace

[EROR] [08-19|04:01:12.362] Staged Sync                              err="[9/16 IntermediateHashes] wrong trie root: c7d3c168bb0fe9acb6e5f7e79386f0a253f3f82f64380789f1665bb850
3ba751, expected (from header): 196ecc29fb862092498125760993a66d3abaedef6591171dd7ff77f3aad80e87"
[TRACE] [08-19|04:01:12.869] Unwind...                                stage=Finish
[TRACE] [08-19|04:01:12.869] Unwind...                                stage=TxLookup
[TRACE] [08-19|04:01:12.869] [14/16 TxLookup] Extraction finished     took=12.773µs
[TRACE] [08-19|04:01:12.869] [14/16 TxLookup] ETL Load done           bucket=BlockTransactionLookup records=0
[TRACE] [08-19|04:01:12.869] [14/16 TxLookup] Load finished           took=13.016µs
[TRACE] [08-19|04:01:12.869] Unwind...                                stage=LogIndex
[TRACE] [08-19|04:01:12.869] Unwind...                                stage=StorageHistoryIndex
[TRACE] [08-19|04:01:12.869] Unwind...                                stage=AccountHistoryIndex
[TRACE] [08-19|04:01:12.869] Unwind...                                stage=CallTraces
[TRACE] [08-19|04:01:12.869] [10/16 CallTraces] ETL Load done         bucket= records=0
[TRACE] [08-19|04:01:12.869] [10/16 CallTraces] ETL Load done         bucket= records=0
[TRACE] [08-19|04:01:12.869] Unwind...                                stage=HashState
[INFO] [08-19|04:01:12.869] [8/16 HashState] Unwinding started       from=1600 to=1599 storage=false codes=true
[TRACE] [08-19|04:01:12.869] [8/16 HashState] Extraction finished     took=1.698µs
[TRACE] [08-19|04:01:12.869] [8/16 HashState] ETL Load done           bucket=HashedCodeHash records=0
[TRACE] [08-19|04:01:12.869] [8/16 HashState] Load finished           took=10.463µs
[INFO] [08-19|04:01:12.869] [8/16 HashState] Unwinding started       from=1600 to=1599 storage=false codes=false
[TRACE] [08-19|04:01:12.869] [8/16 HashState] Extraction finished     took=1.604µs
[TRACE] [08-19|04:01:12.869] [8/16 HashState] ETL Load done           bucket=HashedAccount records=0
[TRACE] [08-19|04:01:12.869] [8/16 HashState] Load finished           took=6.888µs
[INFO] [08-19|04:01:12.869] [8/16 HashState] Unwinding started       from=1600 to=1599 storage=true codes=false
[TRACE] [08-19|04:01:12.869] [8/16 HashState] Extraction finished     took=1.764µs
[TRACE] [08-19|04:01:12.869] [8/16 HashState] ETL Load done           bucket=HashedStorage records=0
[TRACE] [08-19|04:01:12.869] [8/16 HashState] Load finished           took=11.55µs
[TRACE] [08-19|04:01:12.869] Unwind...                                stage=IntermediateHashes
[INFO] [08-19|04:01:12.869] [9/16 IntermediateHashes] Unwinding      from=1600 to=1599 csbucket=AccountChangeSet
[TRACE] [08-19|04:01:12.869] [9/16 IntermediateHashes] Extraction finished took=2.131µs
[TRACE] [08-19|04:01:12.869] [9/16 IntermediateHashes] ETL Load done  bucket= records=0
[TRACE] [08-19|04:01:12.869] [9/16 IntermediateHashes] Load finished  took=6.28µs
[INFO] [08-19|04:01:12.869] [9/16 IntermediateHashes] Unwinding      from=1600 to=1599 csbucket=StorageChangeSet
[TRACE] [08-19|04:01:12.869] [9/16 IntermediateHashes] Extraction finished took=2.069µs
[TRACE] [08-19|04:01:12.869] [9/16 IntermediateHashes] ETL Load done  bucket= records=0
[TRACE] [08-19|04:01:12.869] [9/16 IntermediateHashes] Load finished  took=6.775µs
[EROR] [08-19|04:01:12.869] Staged Sync                              err="[9/16 IntermediateHashes] wrong trie root: c7d3c168bb0fe9acb6e5f7e79386f0a253f3f82f64380789f1665bb850
3ba751, expected (from header): 196ecc29fb862092498125760993a66d3abaedef6591171dd7ff77f3aad80e87"
[DBUG] [08-19|04:01:13.060] [1/5 MiningCreateBlock] Candidate txs    amount=0
[INFO] [08-19|04:01:13.060] [1/5 MiningCreateBlock] Start mine       block=1601 baseFee=<nil> gasLimit=30000000
[INFO] [08-19|04:01:13.060] FinalizeBlockExecution gasUsed 0, header number 1601
[INFO] [08-19|04:01:13.060] before slash validator 0
[INFO] [08-19|04:01:13.060] slash validator 0
[INFO] [08-19|04:01:13.060] engine Finalize gasUsed 0, header number 1601
[TRACE] [08-19|04:01:13.060] [3/5 HashState] Extraction finished      took=15.309µs
[TRACE] [08-19|04:01:13.060] [3/5 HashState] ETL Load done            bucket=HashedCodeHash records=0
[TRACE] [08-19|04:01:13.060] [3/5 HashState] Load finished            took=13.688µs
[TRACE] [08-19|04:01:13.060] [3/5 HashState] Extraction finished      took=4.695µs
[TRACE] [08-19|04:01:13.060] [3/5 HashState] ETL Load done            bucket=HashedAccount records=0
[TRACE] [08-19|04:01:13.060] [3/5 HashState] Load finished            took=8.963µs
[TRACE] [08-19|04:01:13.060] [3/5 HashState] Extraction finished      took=3.841µs
[TRACE] [08-19|04:01:13.060] [3/5 HashState] ETL Load done            bucket=HashedStorage records=0
[TRACE] [08-19|04:01:13.060] [3/5 HashState] Load finished            took=8.967µs
[TRACE] [08-19|04:01:13.060] [4/5 IntermediateHashes] Incremental state promotion of intermediate hashes from=1600 to=1601 csbucket=AccountChangeSet
[TRACE] [08-19|04:01:13.060] [4/5 IntermediateHashes] Extraction finished took=3.914µs
[TRACE] [08-19|04:01:13.060] [4/5 IntermediateHashes] ETL Load done   bucket= records=0
[TRACE] [08-19|04:01:13.060] [4/5 IntermediateHashes] Load finished   took=8.107µs
[TRACE] [08-19|04:01:13.060] [4/5 IntermediateHashes] Incremental state promotion of intermediate hashes from=1600 to=1601 csbucket=StorageChangeSet
[TRACE] [08-19|04:01:13.060] [4/5 IntermediateHashes] Extraction finished took=3.167µs
[TRACE] [08-19|04:01:13.060] [4/5 IntermediateHashes] ETL Load done   bucket= records=0
[TRACE] [08-19|04:01:13.060] [4/5 IntermediateHashes] Load finished   took=5.934µs
[TRACE] [08-19|04:01:13.060] [4/5 IntermediateHashes] ETL Load done   bucket=TrieAccount records=9
[TRACE] [08-19|04:01:13.060] [4/5 IntermediateHashes] ETL Load done   bucket=TrieStorage records=0
[INFO] [08-19|04:01:13.060] cfg.sealCancel: 0
[INFO] [08-19|04:01:13.060] none in-flight sealing task
[INFO] [08-19|04:01:13.060] after cfg.sealCancel: 0
[INFO] [08-19|04:01:13.060] Seal start
[INFO] [08-19|04:01:13.060] [parlia] Signed recently, must wait for others
[TRACE] [08-19|04:01:13.376] Unwind...                                stage=Finish
[TRACE] [08-19|04:01:13.376] Unwind...                                stage=TxLookup
[TRACE] [08-19|04:01:13.376] [14/16 TxLookup] Extraction finished     took=16.446µs
[TRACE] [08-19|04:01:13.376] [14/16 TxLookup] ETL Load done           bucket=BlockTransactionLookup records=0
[TRACE] [08-19|04:01:13.376] [14/16 TxLookup] Load finished           took=29.201µs
[TRACE] [08-19|04:01:13.376] Unwind...                                stage=LogIndex
[TRACE] [08-19|04:01:13.376] Unwind...                                stage=StorageHistoryIndex
[TRACE] [08-19|04:01:13.376] Unwind...                                stage=AccountHistoryIndex
[TRACE] [08-19|04:01:13.376] Unwind...                                stage=CallTraces
[TRACE] [08-19|04:01:13.376] [10/16 CallTraces] ETL Load done         bucket= records=0
[TRACE] [08-19|04:01:13.376] [10/16 CallTraces] ETL Load done         bucket= records=0
[TRACE] [08-19|04:01:13.376] Unwind...                                stage=HashState
[INFO] [08-19|04:01:13.376] [8/16 HashState] Unwinding started       from=1600 to=1599 storage=false codes=true
[TRACE] [08-19|04:01:13.376] [8/16 HashState] Extraction finished     took=2.823µs
[TRACE] [08-19|04:01:13.376] [8/16 HashState] ETL Load done           bucket=HashedCodeHash records=0
[TRACE] [08-19|04:01:13.376] [8/16 HashState] Load finished           took=7.801µs
[INFO] [08-19|04:01:13.376] [8/16 HashState] Unwinding started       from=1600 to=1599 storage=false codes=false
[TRACE] [08-19|04:01:13.376] [8/16 HashState] Extraction finished     took=1.532µs
[TRACE] [08-19|04:01:13.376] [8/16 HashState] ETL Load done           bucket=HashedAccount records=0
[TRACE] [08-19|04:01:13.376] [8/16 HashState] Load finished           took=7.615µs
[INFO] [08-19|04:01:13.376] [8/16 HashState] Unwinding started       from=1600 to=1599 storage=true codes=false
[TRACE] [08-19|04:01:13.376] [8/16 HashState] Extraction finished     took=1.931µs
[TRACE] [08-19|04:01:13.376] [8/16 HashState] ETL Load done           bucket=HashedStorage records=0
[TRACE] [08-19|04:01:13.376] [8/16 HashState] Load finished           took=9.24µs
[TRACE] [08-19|04:01:13.376] Unwind...                                stage=IntermediateHashes
[INFO] [08-19|04:01:13.376] [9/16 IntermediateHashes] Unwinding      from=1600 to=1599 csbucket=AccountChangeSet
[TRACE] [08-19|04:01:13.376] [9/16 IntermediateHashes] Extraction finished took=1.8µs
[TRACE] [08-19|04:01:13.376] [9/16 IntermediateHashes] ETL Load done  bucket= records=0
[TRACE] [08-19|04:01:13.376] [9/16 IntermediateHashes] Load finished  took=6.129µs
[INFO] [08-19|04:01:13.376] [9/16 IntermediateHashes] Unwinding      from=1600 to=1599 csbucket=StorageChangeSet
[TRACE] [08-19|04:01:13.376] [9/16 IntermediateHashes] Extraction finished took=2.475µs
[TRACE] [08-19|04:01:13.376] [9/16 IntermediateHashes] ETL Load done  bucket= records=0
[TRACE] [08-19|04:01:13.376] [9/16 IntermediateHashes] Load finished  took=6.451µs
[EROR] [08-19|04:01:13.376] Staged Sync                              err="[9/16 IntermediateHashes] wrong trie root: c7d3c168bb0fe9acb6e5f7e79386f0a253f3f82f64380789f1665bb8503ba751, expected (from header): 196ecc29fb862092498125760993a66d3abaedef6591171dd7ff77f3aad80e87"

zeech3 avatar Aug 19 '22 04:08 zeech3

mainnet? or it's some private mining devnet?

AskAlexSharov avatar Aug 19 '22 07:08 AskAlexSharov

private mining devnet

@AskAlexSharov

some useful log follow here.

[INFO] [08-19|09:03:59.011] [1/16 Headers] Waiting for headers...    from=129
[INFO] [08-19|09:03:59.099] [1/5 MiningCreateBlock] Start mine       block=130 baseFee=<nil> gasLimit=37272505
[INFO] [08-19|09:03:59.100] Sealing block with                       number=130 delay=5.89959141s headerDifficulty=1 val=0xeE06551B99525B8C3ef93f6472caC2F79Aa584AF hash=0x5553
8b17faaaf744ca533494a929219de02caf3b6c19b4b7a1264195ba04a8bd
[INFO] [08-19|09:04:02.105] [1/5 MiningCreateBlock] Start mine       block=130 baseFee=<nil> gasLimit=37272505
[INFO] [08-19|09:04:02.106] Sealing block with                       number=130 delay=2.893556898s headerDifficulty=1 val=0xeE06551B99525B8C3ef93f6472caC2F79Aa584AF hash=0x555
38b17faaaf744ca533494a929219de02caf3b6c19b4b7a1264195ba04a8bd
[INFO] [08-19|09:04:04.002] [1/16 Headers] Processed                 highest inserted=130 age=0 blockHash=0x1ab460d54305ed10d2ef97ed5e8d0819b05f3783d9459dcbefa1081d89d71915
[INFO] [08-19|09:04:04.002] [6/16 Execution] Completed on            block=130
[EROR] [08-19|09:04:04.003] [9/16 IntermediateHashes] Wrong trie root of block 130: 6e95875dac8de8b28a15aa14e16c08ab0a3f29da669fba08fdce0bce177e414b, expected (from header): a
5993611de27d478683bee89e941e574f797ea0086f9809c8f08e0e896ef57b5. Block hash: 1ab460d54305ed10d2ef97ed5e8d0819b05f3783d9459dcbefa1081d89d71915
[EROR] [08-19|09:04:04.003] Staged Sync                              err="runtime error: invalid memory address or nil pointer dereference, trace: [stageloop.go:129 panic.go:8
44 panic.go:220 signal_unix.go:818 stage_interhashes.go:118 default_stages.go:126 sync.go:347 sync.go:255 stageloop.go:163 stageloop.go:84 asm_amd64.s:1571]"
[INFO] [08-19|09:04:04.503] Commit cycle                             in=7.295µs
[INFO] [08-19|09:04:04.503] RPC Daemon notified of new headers       from=129 to=130 header sending=7.23µs log sending=157ns
[INFO] [08-19|09:04:04.503] [1/16 Headers] Waiting for headers...    from=129
[INFO] [08-19|09:04:05.001] [1/16 Headers] Processed                 highest inserted=130 age=0 blockHash=0x1688b4be168b7f7ddb8183d600bf818709e258686384319ab8b8f0279925610a
[INFO] [08-19|09:04:05.001] [6/16 Execution] Completed on            block=130
[INFO] [08-19|09:04:05.001] Timings (slower than 50ms)               Headers=497ms
[INFO] [08-19|09:04:05.001] Tables                                   PlainState=12.0KB AccountChangeSet=4.0KB StorageChangeSet=12.0KB BlockTransaction=48.0KB TransactionLog=4.
0KB FreeList=4.0KB ReclaimableSpace=4.0MB
[INFO] [08-19|09:04:05.004] Commit cycle                             in=2.698264ms
[INFO] [08-19|09:04:05.004] RPC Daemon notified of new headers       from=129 to=130 header sending=7.634µs log sending=166ns
[INFO] [08-19|09:04:05.004] [1/16 Headers] Waiting for headers...    from=130
[INFO] [08-19|09:04:05.112] [1/5 MiningCreateBlock] Start mine       block=131 baseFee=<nil> gasLimit=37126911
[INFO] [08-19|09:04:05.113] [parlia] Signed recently, must wait for others
[INFO] [08-19|09:04:08.085] [1/5 MiningCreateBlock] Start mine       block=131 baseFee=<nil> gasLimit=37126911
[INFO] [08-19|09:04:08.086] [5/5 MiningFinish] block ready for seal  blocn_num=131 transactions=1 gas_used=1807136 gas_limit=37126911 difficulty=2
[INFO] [08-19|09:04:08.086] [parlia] Signed recently, must wait for others
[INFO] [08-19|09:04:11.005] UnwindTo                                 block=129 bad_block_hash=0x0000000000000000000000000000000000000000000000000000000000000000
[INFO] [08-19|09:04:11.005] [1/16 Headers] Processed                 highest inserted=131 age=1s blockHash=0xf1f6a13672b9d5601487e5dd15ca52ce8ee242cccc55e72f5e679bf459a74e20
[INFO] [08-19|09:04:11.006] [8/16 HashState] Unwinding started       from=130 to=129 storage=false codes=true
[INFO] [08-19|09:04:11.006] [8/16 HashState] Unwinding started       from=130 to=129 storage=false codes=false
[INFO] [08-19|09:04:11.006] [8/16 HashState] Unwinding started       from=130 to=129 storage=true codes=false

Wrong trie root of block 130, it will unwind and report bad header, then panic?

Other suspicious logs

[INFO] [08-19|09:03:44.000] [1/16 Headers] Processed                 highest inserted=125 age=0 blockHash=0x860954c6caee8662ab969fdd60b2a9c1e3adf0f8b5c867237057be5488eab25b
[WARN] [08-19|09:03:44.001] [6/16 Execution] Execution failed        block=125 hash=0x860954c6caee8662ab969fdd60b2a9c1e3adf0f8b5c867237057be5488eab25b err="supposed to get a a
ctual transaction, but get none"
[INFO] [08-19|09:03:44.001] UnwindTo                                 block=124 bad_block_hash=0x860954c6caee8662ab969fdd60b2a9c1e3adf0f8b5c867237057be5488eab25b
[INFO] [08-19|09:03:44.001] [6/16 Execution] Completed on            block=124
[INFO] [08-19|09:03:44.002] Timings (slower than 50ms)               Headers=2.995s

zeech3 avatar Aug 19 '22 09:08 zeech3

geth and erigon are handled differently.

the log of geth

4428     elapsed="407.705µs"
t=2022-08-19T09:03:51+0000 lvl=info msg="Sealing block with"                    number=128 delay=3.997976645s  headerDifficulty=1 val=0x0e7FeAe80bcc9a2Bca8a611Eb6bEB464139DF7c8
t=2022-08-19T09:03:54+0000 lvl=eror msg="\n########## BAD BLOCK #########\nChain config: {ChainID: 714 Homestead: 0 DAO: <nil> DAOSupport: false EIP150: 0 EIP155: 0 EIP158: 0 Byzantium: 0
 Constantinople: 0 Petersburg: 0 Istanbul: 0, Muir Glacier: 0, Ramanujan: 0, Niels: 0, MirrorSync: 0, Bruno: 0, Berlin: <nil>, YOLO v3: <nil>, CatalystBlock: <nil>, London: <nil>, ArrowGl
acier: <nil>, MergeFork:<nil>, Euler: <nil>, Engine: parlia}\n\nNumber: 128\nHash: 0xa98f05f059d2bc8e8dd74caa9a81e46328292d0519792fd9258cac6455c701dc\n\t 0: cumulative: 1772559 gas: 17725
59 contract: 0x79f22946001b89c9508702FC97F311BD358E1289 status: 1 tx: 0xfe4256180f055ef428ecb177c9aab943426682e58c003e98e1df69ceb2de1728 logs: [0xc0019b48f0 0xc0019b49a0] bloom: 000000000
00000000000000000000000200000000000000000800000000000000000000200000000000000000000000000000000000000000000000000000000000000000000000000000008000000000001000000000000000000000000000000000000020000000000000000000800000000000000000000010010000000400000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000002000000000000000000000000000000000000000000000000000020000000000100000000000000400000000000000000000000000000000000000000 state: \n\n\nError: supposed to get a actual transaction, but get none\n##############################\n"
t=2022-08-19T09:03:55+0000 lvl=info msg="Waiting for received in turn block to process"
t=2022-08-19T09:03:56+0000 lvl=info msg="Process backoff time exhausted, start to seal block"
t=2022-08-19T09:03:56+0000 lvl=info msg="Successfully sealed new block"         number=128 sealhash=0x3932254cb9eb86af7ae63e31828cf644a1990cabace8eef560ce8da12c5d0f6a hash=0x8c444b956e89aafa034255afe0fe960c391562e8e1f546a1466912b15cfc02d3 elapsed=4.999s
t=2022-08-19T09:03:56+0000 lvl=info msg="<U+1F517> block reached canonical chain"       number=117 hash=0x1a27533999241f1b283dc90284c95344586a8ee4a64f58fdd6b109d7b7be2752
t=2022-08-19T09:03:56+0000 lvl=info msg="<U+1F528> mined potential block"               number=128 hash=0x8c444b956e89aafa034255afe0fe960c391562e8e1f546a1466912b15cfc02d3
t=2022-08-19T09:03:56+0000 lvl=info msg="Commit new mining work"                number=129 sealhash=0x025059c20df000000e177b0fb66a7e049dcf77e584b8ff599301a79bc9e74232 uncles=0 txs=1 gas=1,792,136 elapsed="624.731µs"
t=2022-08-19T09:03:56+0000 lvl=info msg="Signed recently, must wait for others"
t=2022-08-19T09:03:59+0000 lvl=info msg="Imported new chain segment"            blocks=1 txs=3 mgas=1.807 elapsed=1.017ms     mgasps=1776.390 number=129 hash=0xc53e0d63ee20bdfcb5175ea17ec436f83f5b768067cf2c569b424aaeca4956fb dirty="3.88 KiB"
t=2022-08-19T09:03:59+0000 lvl=info msg="Commit new mining work"                number=130 sealhash=0x5e81cec412842b4a19c9a6af3b66de2829c57f72b110684a4c0ef17c5037d434 uncles=0 txs=0 gas=0         elapsed="183.369µs"
t=2022-08-19T09:03:59+0000 lvl=info msg="Sealing block with"                    number=130 delay=4.996207042s  headerDifficulty=1 val=0x0e7FeAe80bcc9a2Bca8a611Eb6bEB464139DF7c8
t=2022-08-19T09:04:04+0000 lvl=info msg="Successfully sealed new block"         number=130 sealhash=0x5e81cec412842b4a19c9a6af3b66de2829c57f72b110684a4c0ef17c5037d434 hash=0x1ab460d54305ed10d2ef97ed5e8d0819b05f3783d9459dcbefa1081d89d71915 elapsed=4.997s
t=2022-08-19T09:04:04+0000 lvl=info msg="<U+1F528> mined potential block"               number=130 hash=0x1ab460d54305ed10d2ef97ed5e8d0819b05f3783d9459dcbefa1081d89d71915
t=2022-08-19T09:04:04+0000 lvl=info msg="Commit new mining work"                number=131 sealhash=0xf51e2a32630c95548790ae747342fe29ebaf3193dc7800b9ae8d4694e388ff77 uncles=0 txs=0 gas=24428     elapsed="435.676µs"
t=2022-08-19T09:04:04+0000 lvl=info msg="Signed recently, must wait for others"
t=2022-08-19T09:04:05+0000 lvl=eror msg="validate state failed"                 error="invalid merkle root (remote: 6e95875dac8de8b28a15aa14e16c08ab0a3f29da669fba08fdce0bce177e414b local: a5993611de27d478683bee89e941e574f797ea0086f9809c8f08e0e896ef57b5)"
t=2022-08-19T09:04:05+0000 lvl=eror msg="\n########## BAD BLOCK #########\nChain config: {ChainID: 714 Homestead: 0 DAO: <nil> DAOSupport: false EIP150: 0 EIP155: 0 EIP158: 0 Byzantium: 0 Constantinople: 0 Petersburg: 0 Istanbul: 0, Muir Glacier: 0, Ramanujan: 0, Niels: 0, MirrorSync: 0, Bruno: 0, Berlin: <nil>, YOLO v3: <nil>, CatalystBlock: <nil>, London: <nil>, ArrowGlacier: <nil>, MergeFork:<nil>, Euler: <nil>, Engine: parlia}\n\nNumber: 130\nHash: 0x1688b4be168b7f7ddb8183d600bf818709e258686384319ab8b8f0279925610a\n\n\nError: invalid merkle root (remote: 6e95875dac8de8b28a15aa14e16c08ab0a3f29da669fba08fdce0bce177e414b local: a5993611de27d478683bee89e941e574f797ea0086f9809c8f08e0e896ef57b5)\n##############################\n"
t=2022-08-19T09:04:05+0000 lvl=eror msg="validate state failed"                 error="invalid merkle root (remote: 6e95875dac8de8b28a15aa14e16c08ab0a3f29da669fba08fdce0bce177e414b local: a5993611de27d478683bee89e941e574f797ea0086f9809c8f08e0e896ef57b5)"
t=2022-08-19T09:04:05+0000 lvl=info msg="Skip duplicated bad block"             number=130 hash=0x1688b4be168b7f7ddb8183d600bf818709e258686384319ab8b8f0279925610a
t=2022-08-19T09:04:05+0000 lvl=eror msg="\n########## BAD BLOCK #########\nChain config: {ChainID: 714 Homestead: 0 DAO: <nil> DAOSupport: false EIP150: 0 EIP155: 0 EIP158: 0 Byzantium: 0 Constantinople: 0 Petersburg: 0 Istanbul: 0, Muir Glacier: 0, Ramanujan: 0, Niels: 0, MirrorSync: 0, Bruno: 0, Berlin: <nil>, YOLO v3: <nil>, CatalystBlock: <nil>, London: <nil>, ArrowGlacier: <nil>, MergeFork:<nil>, Euler: <nil>, Engine: parlia}\n\nNumber: 130\nHash: 0x1688b4be168b7f7ddb8183d600bf818709e258686384319ab8b8f0279925610a\n\n\nError: invalid merkle root (remote: 6e95875dac8de8b28a15aa14e16c08ab0a3f29da669fba08fdce0bce177e414b local: a5993611de27d478683bee89e941e574f797ea0086f9809c8f08e0e896ef57b5)\n##############################\n"
t=2022-08-19T09:04:10+0000 lvl=info msg="Imported new chain segment"            blocks=1 txs=1 mgas=0.024 elapsed="623.26µs"  mgasps=39.194   number=131 hash=0xf1f6a13672b9d5601487e5dd15ca52ce8ee242cccc55e72f5e679bf459a74e20 dirty="2.13 KiB"
t=2022-08-19T09:04:10+0000 lvl=info msg="<U+1F517> block reached canonical chain"       number=120 hash=0x05094d34f5bd0e39ddfa7cee929f79a2687776e30cc63f946b17e458102ca73b
t=2022-08-19T09:04:10+0000 lvl=info msg="Commit new mining work"                number=132 sealhash=0xbcc07ee65c08887078b3f011ace267cc7438558b4af9d477fee33e06c3357819 uncles=0 txs=1 gas=1,792,136 elapsed="705.351µs"

the log of erigon

[INFO] [08-19|09:03:54.004] RPC Daemon notified of new headers       from=127 to=128 header sending=4.996µs log sending=148ns
[INFO] [08-19|09:03:54.004] [1/16 Headers] Waiting for headers...    from=127
[INFO] [08-19|09:03:56.003] [1/16 Headers] Processed                 highest inserted=128 age=1s blockHash=0x8c444b956e89aafa034255afe0fe960c391562e8e1f546a1466912b15cfc02d3
[INFO] [08-19|09:03:56.003] [6/16 Execution] Completed on            block=128
[INFO] [08-19|09:03:56.004] Timings (slower than 50ms)               Headers=1.999s
[INFO] [08-19|09:03:56.004] Tables                                   PlainState=12.0KB AccountChangeSet=4.0KB StorageChangeSet=12.0KB BlockTransaction=36.0KB TransactionLog=4.0KB FreeList=4.0KB ReclaimableSpace=4.0MB
[INFO] [08-19|09:03:56.007] Commit cycle                             in=2.665748ms
[INFO] [08-19|09:03:56.007] RPC Daemon notified of new headers       from=127 to=128 header sending=6.064µs log sending=142ns
[INFO] [08-19|09:03:56.007] [1/16 Headers] Waiting for headers...    from=128
[INFO] [08-19|09:03:56.092] [1/5 MiningCreateBlock] Start mine       block=129 baseFee=<nil> gasLimit=37236142
[INFO] [08-19|09:03:56.093] Sealing block with                       number=129 delay=3.906605824s headerDifficulty=1 val=0xeE06551B99525B8C3ef93f6472caC2F79Aa584AF hash=0x4825dd9f2c6db4c24c235b21de0a65e650de84569e80aa3142a38469736d8e4f
[INFO] [08-19|09:03:59.002] [1/16 Headers] Processed                 highest inserted=129 age=0 blockHash=0xc53e0d63ee20bdfcb5175ea17ec436f83f5b768067cf2c569b424aaeca4956fb
[INFO] [08-19|09:03:59.005] [6/16 Execution] Completed on            block=129
[INFO] [08-19|09:03:59.005] Timings (slower than 50ms)               Headers=2.995s
[INFO] [08-19|09:03:59.005] Tables                                   PlainState=12.0KB AccountChangeSet=4.0KB StorageChangeSet=12.0KB BlockTransaction=48.0KB TransactionLog=4.0KB FreeList=4.0KB ReclaimableSpace=4.0MB
[INFO] [08-19|09:03:59.010] Commit cycle                             in=5.165383ms
[INFO] [08-19|09:03:59.011] RPC Daemon notified of new headers       from=128 to=129 header sending=6.713µs log sending=149ns
[INFO] [08-19|09:03:59.011] [1/16 Headers] Waiting for headers...    from=129
[INFO] [08-19|09:03:59.099] [1/5 MiningCreateBlock] Start mine       block=130 baseFee=<nil> gasLimit=37272505
[INFO] [08-19|09:03:59.100] Sealing block with                       number=130 delay=5.89959141s headerDifficulty=1 val=0xeE06551B99525B8C3ef93f6472caC2F79Aa584AF hash=0x55538b17faaaf744ca533494a929219de02caf3b6c19b4b7a1264195ba04a8bd
[INFO] [08-19|09:04:02.105] [1/5 MiningCreateBlock] Start mine       block=130 baseFee=<nil> gasLimit=37272505
[INFO] [08-19|09:04:02.106] Sealing block with                       number=130 delay=2.893556898s headerDifficulty=1 val=0xeE06551B99525B8C3ef93f6472caC2F79Aa584AF hash=0x55538b17faaaf744ca533494a929219de02caf3b6c19b4b7a1264195ba04a8bd
[INFO] [08-19|09:04:04.002] [1/16 Headers] Processed                 highest inserted=130 age=0 blockHash=0x1ab460d54305ed10d2ef97ed5e8d0819b05f3783d9459dcbefa1081d89d71915
[INFO] [08-19|09:04:04.002] [6/16 Execution] Completed on            block=130
[EROR] [08-19|09:04:04.003] [9/16 IntermediateHashes] Wrong trie root of block 130: 6e95875dac8de8b28a15aa14e16c08ab0a3f29da669fba08fdce0bce177e414b, expected (from header): a5993611de27d478683bee89e941e574f797ea0086f9809c8f08e0e896ef57b5. Block hash: 1ab460d54305ed10d2ef97ed5e8d0819b05f3783d9459dcbefa1081d89d71915
[EROR] [08-19|09:04:04.003] Staged Sync                              err="runtime error: invalid memory address or nil pointer dereference, trace: [stageloop.go:129 panic.go:844 panic.go:220 signal_unix.go:818 stage_interhashes.go:118 default_stages.go:126 sync.go:347 sync.go:255 stageloop.go:163 stageloop.go:84 asm_amd64.s:1571]"
[INFO] [08-19|09:04:04.503] Commit cycle                             in=7.295µs
[INFO] [08-19|09:04:04.503] RPC Daemon notified of new headers       from=129 to=130 header sending=7.23µs log sending=157ns
[INFO] [08-19|09:04:04.503] [1/16 Headers] Waiting for headers...    from=129
[INFO] [08-19|09:04:05.001] [1/16 Headers] Processed                 highest inserted=130 age=0 blockHash=0x1688b4be168b7f7ddb8183d600bf818709e258686384319ab8b8f0279925610a
[INFO] [08-19|09:04:05.001] [6/16 Execution] Completed on            block=130
[INFO] [08-19|09:04:05.001] Timings (slower than 50ms)               Headers=497ms
[INFO] [08-19|09:04:05.001] Tables                                   PlainState=12.0KB AccountChangeSet=4.0KB StorageChangeSet=12.0KB BlockTransaction=48.0KB TransactionLog=4.0KB FreeList=4.0KB ReclaimableSpace=4.0MB
[INFO] [08-19|09:04:05.004] Commit cycle                             in=2.698264ms
[INFO] [08-19|09:04:05.004] RPC Daemon notified of new headers       from=129 to=130 header sending=7.634µs log sending=166ns
[INFO] [08-19|09:04:05.004] [1/16 Headers] Waiting for headers...    from=130
[INFO] [08-19|09:04:05.112] [1/5 MiningCreateBlock] Start mine       block=131 baseFee=<nil> gasLimit=37126911
[INFO] [08-19|09:04:05.113] [parlia] Signed recently, must wait for others
[INFO] [08-19|09:04:08.085] [1/5 MiningCreateBlock] Start mine       block=131 baseFee=<nil> gasLimit=37126911
[INFO] [08-19|09:04:08.086] [5/5 MiningFinish] block ready for seal  blocn_num=131 transactions=1 gas_used=1807136 gas_limit=37126911 difficulty=2
[INFO] [08-19|09:04:08.086] [parlia] Signed recently, must wait for others
[INFO] [08-19|09:04:11.005] UnwindTo                                 block=129 bad_block_hash=0x0000000000000000000000000000000000000000000000000000000000000000
[INFO] [08-19|09:04:11.005] [1/16 Headers] Processed                 highest inserted=131 age=1s blockHash=0xf1f6a13672b9d5601487e5dd15ca52ce8ee242cccc55e72f5e679bf459a74e20
[INFO] [08-19|09:04:11.006] [8/16 HashState] Unwinding started       from=130 to=129 storage=false codes=true
[INFO] [08-19|09:04:11.006] [8/16 HashState] Unwinding started       from=130 to=129 storage=false codes=false
[INFO] [08-19|09:04:11.006] [8/16 HashState] Unwinding started       from=130 to=129 storage=true codes=false
[INFO] [08-19|09:04:11.006] [9/16 IntermediateHashes] Unwinding      from=130 to=129 csbucket=AccountChangeSet
[INFO] [08-19|09:04:11.006] [9/16 IntermediateHashes] Unwinding      from=130 to=129 csbucket=StorageChangeSet
[EROR] [08-19|09:04:11.006] Staged Sync                              err="[9/16 IntermediateHashes] wrong trie root: 6e95875dac8de8b28a15aa14e16c08ab0a3f29da669fba08fdce0bce177e414b, expected (from header): a5993611de27d478683bee89e941e574f797ea0086f9809c8f08e0e896ef57b5"
[INFO] [08-19|09:04:11.092] [1/5 MiningCreateBlock] Start mine       block=131 baseFee=<nil> gasLimit=37126911
[INFO] [08-19|09:04:11.094] [5/5 MiningFinish] block ready for seal  blocn_num=131 transactions=1 gas_used=1807136 gas_limit=37126911 difficulty=2
[INFO] [08-19|09:04:11.094] [parlia] Signed recently, must wait for others
[INFO] [08-19|09:04:11.507] [8/16 HashState] Unwinding started       from=130 to=129 storage=false codes=true
[INFO] [08-19|09:04:11.507] [8/16 HashState] Unwinding started       from=130 to=129 storage=false codes=false
[INFO] [08-19|09:04:11.507] [8/16 HashState] Unwinding started       from=130 to=129 storage=true codes=false
[INFO] [08-19|09:04:11.507] [9/16 IntermediateHashes] Unwinding      from=130 to=129 csbucket=AccountChangeSet
[INFO] [08-19|09:04:11.507] [9/16 IntermediateHashes] Unwinding      from=130 to=129 csbucket=StorageChangeSet

erigon will keep trying to unwind, but keep failing

The block 130 information:

> eth.getBlock(130)
{
  difficulty: 1,
  extraData: "0xd98301010c846765746889676f312e31372e3132856c696e7578000017f4c06dfcd6246332e6bb4e5d1724dbb1bddc4ec1b8762897fe8ec06458bb99beece7580be8cf1591652348556dacc5953ceedf29347d28a815f3c5ff3eda356ae6d40100",
  gasLimit: 37455210,
  gasUsed: 0,
  hash: "0x1ab460d54305ed10d2ef97ed5e8d0819b05f3783d9459dcbefa1081d89d71915",
  logsBloom: "0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000",
  miner: "0x0e7feae80bcc9a2bca8a611eb6beb464139df7c8",
  mixHash: "0x0000000000000000000000000000000000000000000000000000000000000000",
  nonce: "0x0000000000000000",
  number: 130,
  parentHash: "0xc53e0d63ee20bdfcb5175ea17ec436f83f5b768067cf2c569b424aaeca4956fb",
  receiptsRoot: "0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421",
  sha3Uncles: "0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347",
  size: 608,
  stateRoot: "0xa5993611de27d478683bee89e941e574f797ea0086f9809c8f08e0e896ef57b5",
  timestamp: 1660899844,
  totalDifficulty: 247,
  transactions: [],
  transactionsRoot: "0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421",
  uncles: []
}

zeech3 avatar Aug 19 '22 10:08 zeech3

mainnet? or it's some private mining devnet?

@AskAlexSharov Confirming that I've encountered the same error on polygon during synchronization and the same error when trying to unwind. Erigon version is devel hash 6e18bef6

During synchronization:

[INFO] [08-15|21:13:56.411] [6/16 Execution] Completed on            block=31943872
[EROR] [08-15|21:13:57.035] [9/16 IntermediateHashes] Wrong trie root of block 31943872: 96634927ee96966189f9c39124c9c1a2afa0157a8cf4546e544480202206f4ba, expected (from header): 91e920ce8d81c111d41e0fa6447eebd85e1f21be0d3ccccefcc451fe1d7fe168. Block hash: 3b44dd6671819616efd265ee39efe84c51925ad630a47d51fd6b8df2beb499ce 
[WARN] [08-15|21:13:57.035] Unwinding due to incorrect root hash     to=31943871
[INFO] [08-15|21:13:57.035] UnwindTo                                 block=31943871 bad_block_hash=0x3b44dd6671819616efd265ee39efe84c51925ad630a47d51fd6b8df2beb499ce
[INFO] [08-15|21:13:57.035] [8/16 HashState] Unwinding started       from=31943872 to=31943871 storage=false codes=true
[INFO] [08-15|21:13:57.037] [8/16 HashState] Unwinding started       from=31943872 to=31943871 storage=false codes=false
[INFO] [08-15|21:13:57.039] [8/16 HashState] Unwinding started       from=31943872 to=31943871 storage=true codes=false
[INFO] [08-15|21:13:57.048] [6/16 Execution] Unwind Execution        from=31943872 to=31943871

Then it would repeatedly fail here on execution:

[INFO] [08-19|12:49:42.852] [6/16 Execution] Blocks execution        from=32009919 to=32083335
[WARN] [08-19|12:49:42.855] [6/16 Execution] Execution failed        block=32009920 hash=0xed6169f7dbf39fcb45b7e62b47dafc46bdd3793afa10909f74b797841f68ca2e err="could not apply tx 0 from block 32009920 [0x7aa5d2b98d05e84ddd75842a98124538334b1f7264386a72c363049fbf19d9d5]: nonce too low: address 0x2db560c2C2bf877eC2e5beAd9d7213619A8eF739, tx: 288537 state: 288538"
[INFO] [08-19|12:49:42.856] UnwindTo                                 block=32009919 bad_block_hash=0xed6169f7dbf39fcb45b7e62b47dafc46bdd3793afa10909f74b797841f68ca2e
[INFO] [08-19|12:49:42.856] [6/16 Execution] Completed on            block=32009919

And when I ran integration state_stages --unwind=10 in an attempt to fix it:

INFO[08-20|10:52:39.376] [9/16 IntermediateHashes] Regeneration ended 
EROR[08-20|10:52:39.380] [9/16 IntermediateHashes] Wrong trie root of block 32009869: 514b7ef5ca20659ee71bc729a2c961f26003132569cd8f4c3517a22a41dcd3ae, expected (from header): e0f63d4bf8f391d0e118399025487c8cc4d4eeef4a9b2802c3f1d46f96fd2a08. Block hash: 67c067d38b18db847d45e9aa3f48ae58b0cc558ed1db4d7fae0f2c56eba8ba80 
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1891287]

goroutine 1 [running]:
github.com/ledgerwatch/erigon/eth/stagedsync.SpawnIntermediateHashesStage(0xc0371847c0, {0x21d35a0, 0xc09b470370}, {0x21f0990, 0xc0a27c2480}, {{0x21e7860, 0xc0002ff1e0}, 0x1, 0x0, {0xc000c106c0, ...}, ...}, ...)
	github.com/ledgerwatch/erigon/eth/stagedsync/stage_interhashes.go:108 +0x1227
github.com/ledgerwatch/erigon/eth/stagedsync.DefaultStages.func25(0x0, 0x0, 0xc0371847c0, {0x21d35a0, 0xc09b470370}, {0x21f0990, 0xc0a27c2480})
	github.com/ledgerwatch/erigon/eth/stagedsync/default_stages.go:126 +0x105
github.com/ledgerwatch/erigon/eth/stagedsync.(*Sync).runStage(0xc09b470370, 0xc09b45f4f0, {0x21e7860, 0xc0002ff1e0}, {0x21f0990, 0xc0a27c2480}, 0x0, 0x0)
	github.com/ledgerwatch/erigon/eth/stagedsync/sync.go:347 +0x257
github.com/ledgerwatch/erigon/eth/stagedsync.(*Sync).Run(0xc09b470370, {0x21e7860, 0xc0002ff1e0}, {0x21f0990, 0xc0a27c2480}, 0x0)
	github.com/ledgerwatch/erigon/eth/stagedsync/sync.go:255 +0x6fd
github.com/ledgerwatch/erigon/cmd/integration/commands.syncBySmallSteps({0x21e7860, 0xc0002ff1e0}, {0x0, 0x0, 0x0, {0x0, 0x0, 0x0, 0x0, 0x0, ...}, ...}, ...)
	github.com/ledgerwatch/erigon/cmd/integration/commands/state_stages.go:290 +0x11d5
github.com/ledgerwatch/erigon/cmd/integration/commands.glob..func25(0x29c2380, {0xc00139d680, 0x0, 0x3})
	github.com/ledgerwatch/erigon/cmd/integration/commands/state_stages.go:63 +0x2e6
github.com/spf13/cobra.(*Command).execute(0x29c2380, {0xc00139d620, 0x3, 0x3})
	github.com/spf13/[email protected]/command.go:872 +0xb3c
github.com/spf13/cobra.(*Command).ExecuteC(0x29c3500)
	github.com/spf13/[email protected]/command.go:990 +0x8b3
github.com/spf13/cobra.(*Command).Execute(0x29c3500)
	github.com/spf13/[email protected]/command.go:918 +0x2f
github.com/spf13/cobra.(*Command).ExecuteContext(0x29c3500, {0x21e1320, 0xc00071c100})
	github.com/spf13/[email protected]/command.go:911 +0x73
main.main()
	github.com/ledgerwatch/erigon/cmd/integration/main.go:15 +0x65

Would this require a complete resynchronization?

michaelscheung avatar Aug 20 '22 18:08 michaelscheung

Narrowed it down to integration_stage_trie:

user@UbuntuAmd:/mnt/raid_array/erigon/erigon-release$  ./build/bin/integration stage_trie --unwind=100  --datadir="/mnt/md4/erigon/polygon/erigon/datadir"   --chain="bor-mainnet" 
INFO[08-20|12:37:33.264] StageExec                                progress=32009769
INFO[08-20|12:37:33.264] StageTrie                                progress=32009919
INFO[08-20|12:37:33.266] [9/16 IntermediateHashes] Unwinding      from=32009919 to=32009819 csbucket=AccountChangeSet
INFO[08-20|12:37:33.270] [9/16 IntermediateHashes] Unwinding      from=32009919 to=32009819 csbucket=StorageChangeSet
EROR[08-20|12:37:33.293] Error                                    err="wrong trie root: dcc4f1144626581be73d90abec3583cf82b0cb605c64a0f99b26d67c0fa7d0b5, expected (from header): 63130046221237d8d72339f696a9f6dd40715237c95873e9af192de7a7e8bf44"
Error: wrong trie root: dcc4f1144626581be73d90abec3583cf82b0cb605c64a0f99b26d67c0fa7d0b5, expected (from header): 63130046221237d8d72339f696a9f6dd40715237c95873e9af192de7a7e8bf44
Usage:
  integration stage_trie [flags]

michaelscheung avatar Aug 20 '22 19:08 michaelscheung

@michaelscheung It seems we have the same problem. Syncstage will get stuck when there is a trie root calculation error.

zeech3 avatar Aug 21 '22 07:08 zeech3

Have you ever fix it? @michaelscheung

zeech3 avatar Aug 22 '22 05:08 zeech3

Have you ever fix it? @michaelscheung

@fynn-z I'm currently attempting to re-synchronize a backup I took two months ago on an older version of Erigon. I'll let you know when it's done synchronizing or if it encounters an error again. Have you tried re-synchronizing a backup, or re-synchronizing from scratch if it can be done in a reasonable amount of time and you don't have a backup? I am re-synchronizing on erigon version 2022.07.3-alpha-af9f49ec

michaelscheung avatar Aug 22 '22 19:08 michaelscheung

Have you ever fix it? @michaelscheung

@fynn-z I'm currently attempting to re-synchronize a backup I took two months ago on an older version of Erigon. I'll let you know when it's done synchronizing or if it encounters an error again. Have you tried re-synchronizing a backup, or re-synchronizing from scratch if it can be done in a reasonable amount of time and you don't have a backup? I am re-synchronizing on erigon version 2022.07.3-alpha-af9f49ec

I have fixed it today. In my case, I find a panic occur while stage syncing. But in order to avoid crash, Erigon will print the panic stack and then continue processing the next round of stage sync.

func MiningStep(ctx context.Context, kv kv.RwDB, mining *stagedsync.Sync) (err error) {
	defer func() {
		if rec := recover(); rec != nil {
			err = fmt.Errorf("%+v, trace: %s", rec, dbg.Stack())
		}
	}() // avoid crash because Erigon's core does many things

	tx, err := kv.BeginRo(ctx)
	if err != nil {
		return err
	}
	defer tx.Rollback()

	miningBatch := memdb.NewMemoryBatch(tx)
	defer miningBatch.Rollback()

	if err = mining.Run(nil, miningBatch, false); err != nil {
		return err
	}
	tx.Rollback()
	return nil
}

Dirty data is not cleaned up here, causing the next round of unwind failed in wrong trie root.

#5146

BTW, @AskAlexSharov , There needs to be a better way to handle panic during stagesync, crash may be the easiest and most efficient way, continuing to run stage sync on the wrong Intermediate hash may cause data corruption and irrecoverable.

zeech3 avatar Aug 23 '22 02:08 zeech3

Nice detective work @fynn-z!

Does this mean that the node will still need to be resynchronized since the data is already corrupted and can't be unwound? Given the very long time needed to synchronize these larger chains, I am inclined to agree that a crash is much preferred over possible data corruption. I'll pull from your branch and try to run both my broken node and the one that's re-synchronizing with your fix.

michaelscheung avatar Aug 23 '22 04:08 michaelscheung

Trie root error may be fixed by force unwind: integration state_stages —unwind=100 integration stage_headers —unwind=100

AskAlexSharov avatar Aug 23 '22 04:08 AskAlexSharov

@AskAlexSharov Below was the error I got when I tried to unwind state_stages (stage_headers works fine). I had tried unwinding anywhere from 10 to 10,000 blocks, same error. I tried doing one stage at a time, and it would fail on stage_trie (see some of my previous posts). The wrong trie root is also preventing it from unwinding.

INFO[08-20|10:52:39.376] [9/16 IntermediateHashes] Regeneration ended 
EROR[08-20|10:52:39.380] [9/16 IntermediateHashes] Wrong trie root of block 32009869: 514b7ef5ca20659ee71bc729a2c961f26003132569cd8f4c3517a22a41dcd3ae, expected (from header): e0f63d4bf8f391d0e118399025487c8cc4d4eeef4a9b2802c3f1d46f96fd2a08. Block hash: 67c067d38b18db847d45e9aa3f48ae58b0cc558ed1db4d7fae0f2c56eba8ba80 
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1891287]

goroutine 1 [running]:
github.com/ledgerwatch/erigon/eth/stagedsync.SpawnIntermediateHashesStage(0xc0371847c0, {0x21d35a0, 0xc09b470370}, {0x21f0990, 0xc0a27c2480}, {{0x21e7860, 0xc0002ff1e0}, 0x1, 0x0, {0xc000c106c0, ...}, ...}, ...)
	github.com/ledgerwatch/erigon/eth/stagedsync/stage_interhashes.go:108 +0x1227
github.com/ledgerwatch/erigon/eth/stagedsync.DefaultStages.func25(0x0, 0x0, 0xc0371847c0, {0x21d35a0, 0xc09b470370}, {0x21f0990, 0xc0a27c2480})
	github.com/ledgerwatch/erigon/eth/stagedsync/default_stages.go:126 +0x105
github.com/ledgerwatch/erigon/eth/stagedsync.(*Sync).runStage(0xc09b470370, 0xc09b45f4f0, {0x21e7860, 0xc0002ff1e0}, {0x21f0990, 0xc0a27c2480}, 0x0, 0x0)
	github.com/ledgerwatch/erigon/eth/stagedsync/sync.go:347 +0x257
github.com/ledgerwatch/erigon/eth/stagedsync.(*Sync).Run(0xc09b470370, {0x21e7860, 0xc0002ff1e0}, {0x21f0990, 0xc0a27c2480}, 0x0)
	github.com/ledgerwatch/erigon/eth/stagedsync/sync.go:255 +0x6fd
github.com/ledgerwatch/erigon/cmd/integration/commands.syncBySmallSteps({0x21e7860, 0xc0002ff1e0}, {0x0, 0x0, 0x0, {0x0, 0x0, 0x0, 0x0, 0x0, ...}, ...}, ...)
	github.com/ledgerwatch/erigon/cmd/integration/commands/state_stages.go:290 +0x11d5
github.com/ledgerwatch/erigon/cmd/integration/commands.glob..func25(0x29c2380, {0xc00139d680, 0x0, 0x3})
	github.com/ledgerwatch/erigon/cmd/integration/commands/state_stages.go:63 +0x2e6
github.com/spf13/cobra.(*Command).execute(0x29c2380, {0xc00139d620, 0x3, 0x3})
	github.com/spf13/[email protected]/command.go:872 +0xb3c
github.com/spf13/cobra.(*Command).ExecuteC(0x29c3500)
	github.com/spf13/[email protected]/command.go:990 +0x8b3
github.com/spf13/cobra.(*Command).Execute(0x29c3500)
	github.com/spf13/[email protected]/command.go:918 +0x2f
github.com/spf13/cobra.(*Command).ExecuteContext(0x29c3500, {0x21e1320, 0xc00071c100})
	github.com/spf13/[email protected]/command.go:911 +0x73
main.main()
	github.com/ledgerwatch/erigon/cmd/integration/main.go:15 +0x65

michaelscheung avatar Aug 23 '22 04:08 michaelscheung

@michaelscheung to fix nil pointer you meed update erigon. To see what stages you need run, do “integration print_stages” If mess up - can do “integration stage_trie —reset”

AskAlexSharov avatar Aug 23 '22 04:08 AskAlexSharov

I did unwind state_stages as well as all of the individual stages on the latest version of Erigon, but kept encountering the same error. Resetting the stage_trie sounds like it might work because the states seems to be corrupted. I will try that and report back. Thanks!

michaelscheung avatar Aug 23 '22 08:08 michaelscheung

Okay after running integration stage_trie —reset node, I have again encountered the same error

EROR[08-23|03:39:23.004] [9/16 IntermediateHashes] Wrong trie root of block 32008831: 20639276ae2cd71b0f04af0977f6a4e66a97f651ddcfc5adc2e4564210ce01e7, expected (from header): 6095bf311356aaf1c6bd44f79149f829e51f3040e0c5b5f0e781a677781d1a89. Block hash: 83208d82076352ab49d009ff76f74a2b8fa7628187e0cb66359e8d3046e5e183  

michaelscheung avatar Aug 23 '22 10:08 michaelscheung

without integration print_stages i have no idea in what state your node.

AskAlexSharov avatar Aug 23 '22 10:08 AskAlexSharov

Apologies here it is

user@UbuntuAmd:/mnt/raid_array/erigon/erigon-fynn-z$ ./build/bin/integration print_stages  --datadir="/mnt/md4/erigon/polygon/erigon/datadir"   --chain="bor-mainnet" 
Note: prune_at doesn't mean 'all data before were deleted' - it just mean stage.Prune function were run to this block. Because 1 stage may prune multiple data types to different prune distance.

 				 stage_at 	 prune_at
Headers 			 32221812 	 0
BlockHashes 			 32221812 	 0
Bodies 				 32221812 	 0
Senders 			 32221812 	 0
Execution 			 32008831 	 32009919
Translation 			 0 		 0
HashState 			 32008831 	 0
IntermediateHashes 		 0 		 0
AccountHistoryIndex 		 32008831 	 0
StorageHistoryIndex 		 32008831 	 0
LogIndex 			 32008831 	 0
CallTraces 			 32008831 	 32009919
TxLookup 			 32008831 	 31919000
Finish 				 32008831 	 0
--
prune distance: 

sequence: EthTx=2026287385, NonCanonicalTx=91637

first header in db: 31919000
first body in db: 31919000

--
snapsthos: blocks=31918999, segments=31918999, indices=31918999

michaelscheung avatar Aug 23 '22 10:08 michaelscheung

Execution 32008831 32009919 how far did you unwind? 10K blocks?

AskAlexSharov avatar Aug 23 '22 13:08 AskAlexSharov

Yes, at least I tried to, but I was unable to unwind stage_trie. Even if I reset and redid the entire step, it would encounter the same error. Also, I've been synchronizing a separate node from a backup took two months ago using @fynn-z's fix, but that has also hit the same error, albeit at at a different block number:

EROR[08-22|23:35:35.455] [9/16 IntermediateHashes] Wrong trie root of block 31342399: e437ddbf8845d220f8942f8d22eac52fbd023c5ac4d537bbe73184a69c0229ca, expected (from header): ca06b74541280aba7ef7c04d62ebb92135983eab3afff26c51b61abc6e400c14. Block hash: 8380bfe98f1011c33727f1eb76a2
d20b7ac617a76cc9ff3a4acb944ef66f00ae                                                                                                                                                                                                                                                      
WARN[08-22|23:35:35.455] Unwinding due to incorrect root hash     to=31260958        

Current stages for the restored backup node:

user@UbuntuAmdServer:/mnt/raid/erigon/erigon-fynn-z$ ./build/bin/integration print_stages  --datadir="/mnt/raid/erigon/polygon/erigon/datadir"   --chain="bor-mainnet"
Note: prune_at doesn't mean 'all data before were deleted' - it just mean stage.Prune function were run to this block. Because 1 stage may prune multiple data types to different prune distance.

 				 stage_at 	 prune_at
Headers 			 31260958 	 0
BlockHashes 			 31260958 	 0
Bodies 				 31260958 	 0
Senders 			 31260958 	 0
Execution 			 31260958 	 31260958
Translation 			 0 		 0
HashState 			 31260958 	 0
IntermediateHashes 		 31260958 	 31260958
AccountHistoryIndex 		 31260958 	 0
StorageHistoryIndex 		 31260958 	 0
LogIndex 			 31260958 	 0
CallTraces 			 31260958 	 31260958
TxLookup 			 31260958 	 30000000
Finish 				 31260958 	 0
--
prune distance: 

sequence: EthTx=1951422848, NonCanonicalTx=103

first header in db: 29000100
first body in db: 29000100

--
snapsthos: blocks=30499999, segments=30499999, indices=30499999

The next thing I can try is to resynchronized from that two month old back up again, but this time using an old version of Erigon

michaelscheung avatar Aug 24 '22 01:08 michaelscheung

  1. yes, you can try older version of erigon

  2. with new or old version of erigon, you can try go forward step-by-step instead of 1 big jump. example:

# restore from backup
integration print_stages
# find block number of stage_exec, let's call it N
STOP_BEFORE_STAGE=Execution ./build/bin/erigon <your_flags>
# wait for completion
integration state_stages --unwind=1 --unwind.every=1000
# it will move forward by 1000 blocks jumps

AskAlexSharov avatar Aug 24 '22 01:08 AskAlexSharov

My polygon node synchronized from backup seems to have been stuck on some kind of queue for the past few days. I don't remember seeing this before, so I wonder if it has anything to do with the trie root error

INFO[08-24|02:48:53.180] {31214008-31555892 links=341885 (31214008-31555892)} => cfecefa444fcb2b153cc4dbe93434e023179fdcca1e04a2aaf4f350dac16c088, anchorQueue.idx=1, next retry in 4.994993395s 
INFO[08-24|02:48:53.180] {31836287-32263343 links=427059 (31836287-32263256-32263343)} => 9ec6bf78f10e90343bfe5093efcd844092eca1eabf14b828a477222ce39384e9, anchorQueue.idx=0, next retry in 1.384098275s 
INFO[08-24|02:48:54.228] [txpool] stat                            block=31260974 pending=10000 baseFee=361 queued=30000 alloc=9.5GB sys=17.3GB
INFO[08-24|02:49:08.282] [snapshots] Dumping txs                  block num=30823571 alloc=11.1GB sys=17.3GB
INFO[08-24|02:49:13.140] Req/resp stats                           req=1725 reqMin=31212196 reqMax=31836286 skel=20 skelMin=31262516 skelMax=31557236 resp=1788 respMin=31212202 respMax=32263353 dups=39007
INFO[08-24|02:49:13.181] Queue sizes                              anchors=2 links=785165 persisted=0

michaelscheung avatar Aug 24 '22 09:08 michaelscheung

Update: I was able to achieve full synchronization using erigon version 2022.08.1-alpha-7dabdc3 on a backup I took two months ago. So the bug appears to be something that was introduced sometime after that.

michaelscheung avatar Aug 30 '22 00:08 michaelscheung

This issue is stale because it has been open for 40 days with no activity. Remove stale label or comment, or this will be closed in 7 days.

github-actions[bot] avatar Oct 09 '22 03:10 github-actions[bot]

This issue was closed because it has been stalled for 7 days with no activity.

github-actions[bot] avatar Oct 16 '22 04:10 github-actions[bot]