reth
reth copied to clipboard
stack overflowing on bad block
Describe the bug
reth crashes with a stack overflow after an invalid block error warning.
Steps to reproduce
run reth, run prysm.
Node logs
2024-02-06T23:17:39.351682Z INFO reth::cli: Status connected_peers=17 freelist=145148 latest_block=19171394
2024-02-06T23:18:04.352274Z INFO reth::cli: Status connected_peers=17 freelist=145148 latest_block=19171394
2024-02-06T23:18:29.352727Z INFO reth::cli: Status connected_peers=18 freelist=145148 latest_block=19171394
2024-02-06T23:18:36.844437Z INFO blockchain_tree: Block is already canonical, ignoring. block_hash=0xd58665483eb5b6d328841ee4613dcee42905dcb0fe7cae6b5cbf5c954c968fdd
2024-02-06T23:18:36.847416Z INFO reth::commands::node::events: Forkchoice updated head_block_hash=0xd58665483eb5b6d328841ee4613dcee42905dcb0fe7cae6b5cbf5c954c968fdd safe_block_hash=0xc848b4152a073dd16770a1efff5dbe7d9608c96bdf1dcc4e35673ce099a21c6a finalized_block_hash=0xc848b4152a073dd16770a1efff5dbe7d9608c96bdf1dcc4e35673ce099a21c6a status=Valid
2024-02-06T23:18:54.351946Z INFO reth::cli: Status connected_peers=19 freelist=145148 latest_block=19171394
2024-02-06T23:19:02.559425Z WARN consensus::engine: Error while processing payload error=InsertBlockError { error: Consensus(BodyStateRootDiff(GotExpected { got: 0xd3d609b5f2fd0910dd275537799ca8d3f47a32ec136a191aa32c56dd43833512, expected: 0x7da550c658bd3cac17918db97e0c5231e39fa5f2991ddfb5f889431955ece45b })), hash: 0x844352ea2eb484f9a3dde277e4e3ce08652eb4e42cd2c4e2393e3bb522dfc3ce, number: 19171395, parent_hash: 0x5a5e6a2872497952fae8f3285ea2edcb24a51c1cb97a3deec54e78896d22205e, num_txs: 225, .. }
2024-02-06T23:19:02.559445Z WARN consensus::engine: Invalid block error on new payload invalid_hash=0x844352ea2eb484f9a3dde277e4e3ce08652eb4e42cd2c4e2393e3bb522dfc3ce invalid_number=19171395 error=Consensus(BodyStateRootDiff(GotExpected { got: 0xd3d609b5f2fd0910dd275537799ca8d3f47a32ec136a191aa32c56dd43833512, expected: 0x7da550c658bd3cac17918db97e0c5231e39fa5f2991ddfb5f889431955ece45b }))
2024-02-06T23:19:02.559470Z WARN consensus::engine: Bad block with hash hash=0x844352ea2eb484f9a3dde277e4e3ce08652eb4e42cd2c4e2393e3bb522dfc3ce header=Header { parent_hash: 0x5a5e6a2872497952fae8f3285ea2edcb24a51c1cb97a3deec54e78896d22205e, ommers_hash: 0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347, beneficiary: 0x4838b106fce9647bdf1e7877bf73ce8b0bad5f97, state_root: 0x7da550c658bd3cac17918db97e0c5231e39fa5f2991ddfb5f889431955ece45b, transactions_root: 0xcc73c4aa12b10950ad4f6ba66becbe25547a452c0a7a5434dd175e6b9bc3b1f8, receipts_root: 0x71858150812ba584da63a4fd122cd4f5a3d4fd0561fedcffeeb54e73fa39ede8, withdrawals_root: Some(0x56bb62bf4e998236198c1f7e1f301e6b994e7adfe352cc36ecafed94c0c0240d), logs_bloom: 0x9aed4175412773431dbe5fae8fadf2e793cf8a645e8fa81c4f1bb67785af31799b8ed7afabf6f4f2627ebd2a8e350ded8a91e257ab062fbacec937804fff3c866956d6987b26a8a8fda35dfecf90c8f9e68551d894e7ca425b291ebfc3a7c22bcdcb55ee8af72d44698e30d92dec78f8ae7d5ed34a1af64dde22809beaba1ca5360b17dd8f6410b8246dbd4e4bc802e6832b582bcbce1ee906e9a9fa28ffa8dfae0b490aa993fa97fa695682ba92e7e8af72d7813ff5fbc05cec263ac5ce0ae0f19f750ed286bfdb62419e1249bf0b87d7e3f366bf9b6858f42f3187d9ace4139dbeacbf61aa3530de65d5d2aa7d5fddbf8a93b0db4e1462a4d9b89977b75e5b, difficulty: 0x0_U256, number: 19171395, gas_limit: 30000000, gas_used: 20798306, timestamp: 1707249179, mix_hash: 0x7fdc5f9d136bc600a842bd313d707449c63c290bb1fb3dea787c1cdff9399213, nonce: 0, base_fee_per_gas: Some(29489621760), blob_gas_used: None, excess_blob_gas: None, parent_beacon_block_root: None, extra_data: 0x546974616e2028746974616e6275696c6465722e78797a29 }
2024-02-06T23:19:19.351702Z INFO reth::cli: Status connected_peers=21 freelist=145148 latest_block=19171394
2024-02-06T23:19:44.352767Z INFO reth::cli: Status connected_peers=24 freelist=145148 latest_block=19171394
thread '<unknown>' has overflowed its stack
fatal runtime error: stack overflow
Platform(s)
Linux (x86)
What version/commit are you on?
alpha.17
What database version are you on?
1
What type of node are you running?
Full via --full flag
What prune config do you use, if any?
No response
If you've built Reth from source, provide the full command you used
No response
Code of Conduct
- [X] I agree to follow the Code of Conduct
hm, it actually also crashes right after running, without getting to the bad block, full execution log:
2024-02-06T23:31:57.957928Z INFO reth::cli: reth 0.1.0-alpha.17 (aac72f4) starting
2024-02-06T23:31:57.957965Z INFO reth::cli: Opening database path="/home/fiiiu/.local/share/reth/mainnet/db" 2024-02-06T23:31:57.966608Z INFO reth::cli: Configuration loaded path="/home/fiiiu/.local/share/reth/mainnet/reth.toml" 2024-02-06T23:31:57.967290Z INFO reth::cli: Database opened 2024-02-06T23:31:57.967673Z INFO reth::cli: Pre-merge hard forks (block based):
- Frontier @0
- Homestead @1150000
- Dao @1920000
- Tangerine @2463000
- SpuriousDragon @2675000
- Byzantium @4370000
- Constantinople @7280000
- Petersburg @7280000
- Istanbul @9069000
- MuirGlacier @9200000
- Berlin @12244000
- London @12965000
- ArrowGlacier @13773000
- GrayGlacier @15050000 Merge hard forks:
- Paris @58750000000000000000000 (network is not known to be merged)
Post-merge hard forks (timestamp based):
- Shanghai @1681338455
2024-02-06T23:31:58.047122Z INFO reth::cli: Transaction pool initialized 2024-02-06T23:31:58.047221Z INFO reth::cli: Connecting to P2P network 2024-02-06T23:31:58.047348Z INFO net::peers: Loading saved peers file=/home/fiiiu/.local/share/reth/mainnet/known-peers.json 2024-02-06T23:31:58.052383Z INFO reth::cli: Connected to P2P network peer_id=0x0e2117c393af8bb7c4d5c69204fe6538adf91c26096dd475023d4027722556d62bb8408a471ebdcab6754223944d32ec0fca4e12fc7770a6bf58176ba1343e01 local_addr=0.0.0.0:30303 enode=enode://0e2117c393af8bb7c4d5c69204fe6538adf91c26096dd475023d4027722556d62bb8408a471ebdcab6754223944d32ec0fca4e12fc7770a6bf58176ba1343e01@127.0.0.1:30303 2024-02-06T23:31:58.052766Z INFO reth::cli: Pruner initialized prune_config=PruneConfig { block_interval: 5, segments: PruneModes { sender_recovery: Some(Full), transaction_lookup: None, receipts: Some(Before(11052984)), account_history: Some(Distance(10064)), storage_history: Some(Distance(10064)), receipts_log_filter: ReceiptsLogPruneConfig({0x00000000219ab540356cbb839cbe05303d7705fa: Before(11052984)}) } } 2024-02-06T23:31:58.052928Z INFO reth::cli: Consensus engine initialized 2024-02-06T23:31:58.052982Z INFO reth::cli: Engine API handler initialized 2024-02-06T23:31:58.057626Z INFO reth::cli: RPC auth server started url=127.0.0.1:8551 2024-02-06T23:31:58.057723Z INFO reth::cli: RPC IPC server started url=/tmp/reth.ipc 2024-02-06T23:31:58.057729Z INFO reth::cli: RPC HTTP server started url=127.0.0.1:8545 2024-02-06T23:31:58.057748Z INFO reth::cli: Starting consensus engine 2024-02-06T23:32:01.054768Z INFO reth::cli: Status connected_peers=0 freelist=145148 latest_block=19171394
thread '
' has overflowed its stack fatal runtime error: stack overflow
Can you please run it in a debugger and print a backtrace?
running with RUST_LOG=trace could also be useful
will do.
there seems to be two different scenarios:
- if I launch reth and only later I launch the consensus client (Prysm), the crash happens after the bad block (first log)
- if I launch reth with Prysm already running, it crashes immediately (second log)
this is the last few lines of the crash in case 2. above, running with RUST_LOG=trace:
2024-02-06T23:46:43.658231Z TRACE batch: jsonrpsee_core::tracing: send="{"jsonrpc":"2.0","result":{"hash":"0x7c1920e142ce19d4c9d1212351c855da363e9fa778f7ee0efbb15170263691e1","parentHash":"0x5d5edad02ba16a3e1a5c6f6cdaa54ae59055d7bf3750fbad76bb0d0dbe128e29","sha3Uncles":"0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347","miner":"0x95222290dd7278aa3ddd389cc1e1d165cc4bafe5","stateRoot":"0x1ef8f936c761f20a0d02ceefda862a2c6462d87e4ec445aadb1b7d3428b79c8a","transactionsRoot":"0x5838ba36db560dd922cef590305772870a6e747ea742cb1ee6fee4c332ce0004","receiptsRoot":"0x1769b395a741e8bd64246b38b9ad7b3f97b4a94605b3952822f030e97870778c","logsBloom":"0x2329a3f6e1c4b3001b5b54fc9d639afcf9fe12380b4118d8029120709ea670b513a59dee184260247eba1d1e25c5b188ae6110a9bb16a9224e74fc8a862e13030118451c341d9f8a6ec76ef9c6a1fb3006437099db4d8c83ed380ed499efd990324a7404b7fd6ac1e4a272798a6e8cc1f7020255888546439616eb13855964d31a0e9b5246058c3cb678a0305cc8d946748124d5e7c391ed4865016c00fc4456f341192799c0a4810a44d0df08fa95dcf5940b6a4c3252005774664f215d023926f5855b12530072a140e398845cf02d0972305c30c96a1a2a4a3b8fa962f2bfdcf471593bc006a89e2df78315739dca2e68f31e1bcc04461f2d502152a5bd29","difficulty":"0x0","number":"0x1247bd9","gasLimit":"0x1c9c380","gasUsed":"0xcf7b2b","timestamp":"0x65c1f743","extraData":"0x6265617665726275696c642e6f7267","mixHash":"0x31c58e614e01f4d2780488d1041284a105041a7b41a9f2d5317751dce54feb79","nonce":"0x0000000000000000","baseFeePerGas":"0x1cf775e7ab","withdrawalsRoot":"0x62cb2730e4c20c6d8d06dc73f668e79c579b1358d3de37d1f182c8a75de412f0","totalDifficulty":"0xc70d815d562d3cfa955","uncles":[],"transactions":["0xbd9691a03ef844120a0ae585426ba54295e4b1e108fb014a846f62d77c0c9071","0x516353db6e9536eb92048e2d3b4f1075adf2be67832c76c853c05ef0814832b0","0x3fdb1103ff85828ee470117702c856f45ce36ea856cb8b3924b4dbd862594431","0x4b9f73654542300674d838b2ea6cebfb6cc60b3bfeb92c43c3f95755c95ce2de","0xcd0e63c9b0b3c2f601d69c6de8e4ccc76bda6bbf5a8ef654a82be4f25e43f91e","0x56f79cfdae6400e6bb3d8140f8b6d443756b15474997ae7bbe8ac2c8c6006b4d","0xb680b0b2a172fac89653f41ce3047b1b952eb62d2aabc4b70d700a739e44d58c","0xd7619a4dfcdd898f0f6ba0cc49248b0fb8f3561bc14108335180a667cd4bfbfb","0x36175829103d5635c834bf6fb8eb27524b02c0da6ee649169705d33213323c93","0x24e7ae80f51b4f6b0a421d9d8c50155bf2d21ea93b62587f666b074f91a59cb0","0xeef644258e9e5cd4ffc60d51a081256de33848836563c689e493eab7315ed284","0x507b35d0c31f1c12622903f4dfef3abc96b431c7847549cfe62047837bf55e3e","0x519272236f80ff4b4a6ec039551962956831f59a57be49dc25ebd92212338f7d","0xc06fd4f07555013e0709e655cda128a2c0428693440cf6aeb91f0ba6e5dff9f0","0xc73852c17b85676ab05b482911d537ca897b0e2ddefb9166368a78a5642cfa26","0x3690353a87e6e516ab089735362fe6b0cb759175ff69a6423f58c71374292b8a","0xbc83a0f40e32f0847ec29b32f55a8ff37848c4855625a2c130494e689d286fb6","0x8748bfa692df550dbe4ada386b068fb92f43254dca6719ec896d1f55314c0476","0x8ed7caf0f94c4bfb5f2584fbded0f2eb609fb337461b489ae7b13fe8f9b29e76","0x9365a846148f8f14685b5fe35c2cd7ab46e729bec1274a284ed23cd0dbf73e5d","0xd4191c2aa73df647882b8fa12c1da93ac86a3a1d5725f85cf413a09e1ae7ad39","0x71bd87437ff46eec644aefdd2e46e92e7a9fe7db7ecedfe02421968ca76bc051","0x08b9e24b359f00c23ef15dd022be5af42ed9d608d277e392ba5bf03bc81e597a","0x97161fcc6df99e0bf278307c72a314dbaeb398bbc576b563aaaed0dc72242b13","0x07267446898580abb8120f71681023c86a44cf697f3f56b788c88eda88295834","0x299c83cced947f69831ba5a53d599d065922555bedec07bceba1e25ec8fd6dff","0x8805580c95e426e9ff0ac29886c53cdb694f3f7c3312d8ca79cdbb21e5c534b7","0x1aae630db5a8adffb4ff917bec6399c29454c524e272a39406b34d5d5a57d1f2","0x6f62c9b177676b74ca49338aa7c683a2448e4ca492eaae8d79fdcb805d3e9692","0x0637a1fad49134b8e3a65a657f1c90f0a67a0e6be80bc3d37185c9a2018a1e05","0x53adfb0cda0a4b11d32356ca2094e8eb3c54177ef8b407733a954245676d918d","0x204bd55fbf71e8de396d235879ed1dd1f7d37ce296469bd45d89353c03958b6c","0x2fbc7a0764d2ae41b3747b62f560ee8b59f8e23048875669a6f9f9e0c10fe84a","0xad4aeead82cb2a9126b36ae77ae3bfc1e41dc87ff7df3995e258b638edc1d06d","0xa193444f940136c6179d8ffd5b1d4784f18bcd66f11c67b15e99aabc977fe122","0xa7c64933e27ebb2fb55ef209fa898b3bb56ba5e7cb654c4b364c77e988d181b7","0xb206433ad56f614c5b74e2834be12b241927cefdc6a434d6a1471c1"
thread '
' has overflowed its stack fatal runtime error: stack overflow
can you share more logs leading up to that, please
will do. is copy pasting here the best for you? happy to send the whole logs over too.
as file attached here also works
these are the last 200 lines of the logs for case 1. where it crashes after launching Prysm:
2024-02-06T23:56:46.768295Z TRACE on_new_payload{block_hash=0x844352ea2eb484f9a3dde277e4e3ce08652eb4e42cd2c4e2393e3bb522dfc3ce block_number=19171395 is_pipeline_idle=true}:try_insert_new_payload:try_insert_validated_block{block=(19171395, 0x844352ea2eb484f9a3dde277e4e3ce08652eb4e42cd2c4e2393e3bb522dfc3ce)}:try_append_canonical_chain:loop{i=0 current="Nibbles("0601")" build_extensions=false}: trie::hash_builder: skipping 2 nibbles 2024-02-06T23:56:46.768304Z TRACE on_new_payload{block_hash=0x844352ea2eb484f9a3dde277e4e3ce08652eb4e42cd2c4e2393e3bb522dfc3ce block_number=19171395 is_pipeline_idle=true}:try_insert_new_payload:try_insert_validated_block{block=(19171395, 0x844352ea2eb484f9a3dde277e4e3ce08652eb4e42cd2c4e2393e3bb522dfc3ce)}:try_append_canonical_chain:loop{i=0 current="Nibbles("0601")" build_extensions=false}: trie::hash_builder: short_node_key=Nibbles("") 2024-02-06T23:56:46.768313Z TRACE on_new_payload{block_hash=0x844352ea2eb484f9a3dde277e4e3ce08652eb4e42cd2c4e2393e3bb522dfc3ce block_number=19171395 is_pipeline_idle=true}:try_insert_new_payload:try_insert_validated_block{block=(19171395, 0x844352ea2eb484f9a3dde277e4e3ce08652eb4e42cd2c4e2393e3bb522dfc3ce)}:try_append_canonical_chain:loop{i=0 current="Nibbles("0601")" build_extensions=false}: trie::hash_builder: pushing branch node hash hash=0x3f075238a5c44b667fafcdf86d08def8aa36b7fc4c15108b6f7cd0220ac3c901 2024-02-06T23:56:46.768324Z TRACE on_new_payload{block_hash=0x844352ea2eb484f9a3dde277e4e3ce08652eb4e42cd2c4e2393e3bb522dfc3ce block_number=19171395 is_pipeline_idle=true}:try_insert_new_payload:try_insert_validated_block{block=(19171395, 0x844352ea2eb484f9a3dde277e4e3ce08652eb4e42cd2c4e2393e3bb522dfc3ce)}:try_append_canonical_chain:loop{i=0 current="Nibbles("0601")" build_extensions=false}: trie::hash_builder: no common prefix to create branch nodes from, returning 2024-02-06T23:56:46.768333Z TRACE on_new_payload{block_hash=0x844352ea2eb484f9a3dde277e4e3ce08652eb4e42cd2c4e2393e3bb522dfc3ce block_number=19171395 is_pipeline_idle=true}:try_insert_new_payload:try_insert_validated_block{block=(19171395, 0x844352ea2eb484f9a3dde277e4e3ce08652eb4e42cd2c4e2393e3bb522dfc3ce)}:try_append_canonical_chain: trie::hash_builder: old key/value key=Nibbles("0601") value=Hash(0x3f075238a5c44b667fafcdf86d08def8aa36b7fc4c15108b6f7cd0220ac3c901) 2024-02-06T23:56:46.768343Z TRACE on_new_payload{block_hash=0x844352ea2eb484f9a3dde277e4e3ce08652eb4e42cd2c4e2393e3bb522dfc3ce block_number=19171395 is_pipeline_idle=true}:try_insert_new_payload:try_insert_validated_block{block=(19171395, 0x844352ea2eb484f9a3dde277e4e3ce08652eb4e42cd2c4e2393e3bb522dfc3ce)}:try_append_canonical_chain: trie::hash_builder: new key/value key=Nibbles("0602") value=Hash(0x649beaac41bd2470e78e55833a0c028f73b53e01df5a644aa2b5baec4f254642) 2024-02-06T23:56:46.768355Z TRACE on_new_payload{block_hash=0x844352ea2eb484f9a3dde277e4e3ce08652eb4e42cd2c4e2393e3bb522dfc3ce block_number=19171395 is_pipeline_idle=true}:try_insert_new_payload:try_insert_validated_block{block=(19171395, 0x844352ea2eb484f9a3dde277e4e3ce08652eb4e42cd2c4e2393e3bb522dfc3ce)}:try_append_canonical_chain: trie::hash_builder: updating merkle tree current=Nibbles("0602") succeeding=Nibbles("0603")
thread '
' has overflowed its stack fatal runtime error: stack overflow
last 10000 lines of log for case 1, crashing after launching Prysm (full logs are ~1TB):
A backtrace would be really helpful @fiiiu https://github.com/paradigmxyz/reth/issues/6452#issuecomment-1930950508
You can open gdb <command>, input r and then bt once it crashes and breaks
ok so now I'm not getting it to crash, it simply complains that it doesn't see the consensus client, while the consensus client says that the execution client is not syncing. here's the trace, but I doubt it's useful. I'll keep trying to get it to crash again.
Thanks Alejo. Please keep it coming, we don't yet have enough information to identify where we should further look at.
The consensus <> execution bug is interesting do you also have the lighthouse logs?
thanks for looking into this, yes I'll keep sending as I unearth new info. I don't have access rn but will share more later.
this is the error message I got from prysm (it doesn't crash, but just doesn't sync):
ERROR execution: Got a validation error in newPayload error=mismatched block state root: got 0xd3d609b5f2fd0910dd275537799ca8d3f47a32ec136a191aa32c56dd43833512, expected 0x7da550c658bd3cac17918db97e0c5231e39fa5f2991ddfb5f889431955ece45b
ok, here are debug logs from prysm.
there's a "Could not validate finalized root" in there, and also the error I shared above. I reported to the prysm team in case this came from their end but they pointed me back to reth :)
ok, just finished resyncing reth from scratch, now seems to be working properly. I'll close this for now.
updated to alpha.21, reth crashes upon running, reopening this issue but not sure it's the same thing.
ok I reverted to alpha.13, ran it until it resynced, went back to alpha.21 and now it seems to be running alright. not sure if/how the reversion worked, but I tried running alpha.21 several times and it always crashed :/
We seems run with the same issue with same deployment, reth + prysm, after we upgraded prysm for decun upgrade.
Looks like prysm is sending a query which triggers the crash of reth, we just flushed prsym state and run it from scratch after restart of reth, and seems it is working properly now
This should no longer be happening. Given we have beta.1 out can we get you guys to switch over to that and confirm if the issue happens still?
This should no longer be happening. Given we have beta.1 out can we get you guys to switch over to that and confirm if the issue happens still?
I have uploaded the stacktrace here https://drive.google.com/file/d/10u1B9gZOSMnxzmQmhmLBgE2Um322DO4u/view?usp=drive_link
we are syncing another node with beta build now, but given the time to sync and size of the storage it costs it is really not that we can just spin a new one to see whether we can reproduce it at will
@gakonst we are running into this reth beta 2, this is triggered by simply restart prysm
This issue is stale because it has been open for 21 days with no activity.
This issue was closed because it has been inactive for 7 days since being marked as stale.