reth
reth copied to clipboard
Error: not able to find Headers static file for block
Describe the bug
Node was syncing, and then it crashed due to a long lived read transaction and then the node restarted
This is a PVC backed by a NFS volume via Google Filestore
Steps to reproduce
- Sync node using docker image on kubernetes with NFS backed PVC for data
- Watch for long lived read transactions in the logs followed by an apparent restart
Node logs
2024-04-02T13:15:50.097537Z DEBUG net::tx: sending `GetPooledTransactions` request to peer's session failed, buffering hashes peer_id="0x20d3…a495" failed_to_request_hashes={0xb1f33474e2549b96b7e66de637b8c506519ef033789a437ea30180f0a6e23349} conn_eth_version=Eth68 client_version=Geth/v1.13.14-stable-2bd6bd01/linux-amd64/go1.21.6
2024-04-02T13:15:50.100572Z DEBUG net::tx: failed sending request to peer's session, buffering hashes peer_id="0x20d3…a495" failed_to_request_hashes=RequestTxHashes { hashes: {0xa5da245418158f0be389a2a484d6795743e954e3f9c74c2027b9ca39f64ccc01, 0xb1f33474e2549b96b7e66de637b8c506519ef033789a437ea30180f0a6e23349, 0xbbc592e731b4751629315bd50358ef16e887828077fdbfd880a917b55fd853c3} } conn_eth_version=Eth68
2024-04-02T13:15:50.127650Z DEBUG net::tx: sending `GetPooledTransactions` request to peer's session failed, buffering hashes peer_id="0x6620…e524" failed_to_request_hashes={0xfb97a7a47072a2576bf151b0ce17c5000f7d018eb863833a8a170cc47bf54127, 0x9cd803d229eb283406caa4f3922408fe1500ba81a9d6c038d5b46e086aca1604, 0x8ab61139a9dc29c1d9446126a3f60610a06b51b615908d6e736cc07e83a40063, 0x585bc731fbab56cb2f878717f227fe8a5c96a12d6da5cb92c1204453be8700ef, 0xa22d9f7dbc27a73df6cdfc7fc01d789d2f97c42be21679f2bd4f63be447c9a90, 0x3637d9923a2ba76053215d51804aa8862ca14b7ddffe8e87c46f029b2f666157, 0x355763a6aaad7694771334b0221187ee0b234dcc490835d964e6a7a4a66d40c9, 0xbe1899359218a1a49886e20ccd88c8a01fa860c42d7f7c6aa639f93609130fb6, 0xff168973202947ae37e94885ac85b76f24adb1924313d76056f462e3650b9b17, 0x794620d7e3afb156bb1e6b76b21e636a56ac490c0fa48f0faa0dfeed1e2bcbe4, 0x59e740879d4e5c27ea9ac90c2e3a25dbaae43f9c22b0b2c40b18c6045626f344, 0xd17185ddd59c91ee296db5cf6352eb8ccc1d0cb0aaf5380517e9ce38e803ca41, 0x7b567872521e407e520b07177769315122df633e016fd6479d218e42ef1c74fc} conn_eth_version=Eth68 client_version=Nethermind/v1.25.4+20b10b35/linux-x64/dotnet8.0.2
2024-04-02T13:15:50.129965Z DEBUG net::tx: failed sending request to peer's session, buffering hashes peer_id="0x6620…e524" failed_to_request_hashes=RequestTxHashes { hashes: {0x7b567872521e407e520b07177769315122df633e016fd6479d218e42ef1c74fc, 0x3ee7c1a34eca5a9f796d43d656d601ba4e46b4624f9a1c774ad4703a38b8815c, 0x355763a6aaad7694771334b0221187ee0b234dcc490835d964e6a7a4a66d40c9, 0x3637d9923a2ba76053215d51804aa8862ca14b7ddffe8e87c46f029b2f666157, 0xfb97a7a47072a2576bf151b0ce17c5000f7d018eb863833a8a170cc47bf54127, 0x59e740879d4e5c27ea9ac90c2e3a25dbaae43f9c22b0b2c40b18c6045626f344, 0x8ab61139a9dc29c1d9446126a3f60610a06b51b615908d6e736cc07e83a40063, 0xce013dec228fa243ebb8d9e55f0a7aa55d3f5e81e9abcb11a0a2eea94b560acd, 0x9cd803d229eb283406caa4f3922408fe1500ba81a9d6c038d5b46e086aca1604, 0x63ef2e59e4fbe27c330ae1407754cc2abfa3b907316458e54fc2d4a01accef02, 0x59ea08878b6af4f88bd30eb99427fe472d1ae9a85a1a264d39085d6905cd17e0, 0x794620d7e3afb156bb1e6b76b21e636a56ac490c0fa48f0faa0dfeed1e2bcbe4, 0x585bc731fbab56cb2f878717f227fe8a5c96a12d6da5cb92c1204453be8700ef, 0xe9ed83069f4a4dfa75d4654d43c9ea2a983a1f4c2f2d2b6dd4b8787561ca02ef, 0xff168973202947ae37e94885ac85b76f24adb1924313d76056f462e3650b9b17, 0x6197497a23b4bded562a7da10851877fcc5684fcebb086013b650dd07d34d5d2, 0xa22d9f7dbc27a73df6cdfc7fc01d789d2f97c42be21679f2bd4f63be447c9a90, 0xbe1899359218a1a49886e20ccd88c8a01fa860c42d7f7c6aa639f93609130fb6, 0xffa03e5c59cc27e08c572f6186517570057d11886f2d4d38a17ca4f067dc7f81, 0xd17185ddd59c91ee296db5cf6352eb8ccc1d0cb0aaf5380517e9ce38e803ca41} } conn_eth_version=Eth68
2024-04-02T13:15:50.152319Z DEBUG net::tx: sending `GetPooledTransactions` request to peer's session failed, buffering hashes peer_id="0x7e6d…3051" failed_to_request_hashes={0xcb64348c2d92c4903d4b5181c4d172e45d9d7690da629a85895df54b4117a5c2} conn_eth_version=Eth68 client_version=Geth/v1.13.14-stable-2bd6bd01/linux-amd64/go1.21.7
2024-04-02T13:15:50.152590Z DEBUG net::tx: sending `GetPooledTransactions` request to peer's session failed, buffering hashes peer_id="0xe229…3d4d" failed_to_request_hashes={0x59c3019c670a6a3d0a731ba113f9770a5b9286ecb524d6aceb339faf8a8b0abb, 0x611717f6cafcfbe8f45d09550e329608da3f98d38ab8e0086af813dce84c51d1, 0xa7cf8b795085c5433b61a5dd2f5cc5b915ebae4fae885db383c3163e4331e2c9, 0x308d9d7487205a40b311284e2210d40c9371251fbb861a942269421f7f120a6c, 0x4ac0478d7683d217ce943a3308e7d9a244d1ce33b864c4f1dad65296010c6546} conn_eth_version=Eth68 client_version=Nethermind/v1.25.4+20b10b35/linux-x64/dotnet8.0.2
2024-04-02T13:15:50.154367Z DEBUG net::tx: failed sending request to peer's session, buffering hashes peer_id="0xe229…3d4d" failed_to_request_hashes=RequestTxHashes { hashes: {0x00393d94d268405681d9068203790121d00a5bac408d80e1f5180b2414318fc3, 0x6945b41296423afe1a82ac1933c42a5aad6796432d845aca9d6354dfdcc0569b, 0x66392986d72ffe3ebfb9079b7c9ea4e999d19a7de818400fde00300c45fbbaac, 0x7ff34560287743c6b2051fb6787ab26cd70116241d68dd590077315dd6792902, 0xdbc4ca73d39caddedaf5e7bc5f04e05d49319f20a5c06462af9b083ac956e329, 0x1db6e829d6bc8ab874f1bde170a30d7f08a31a788ca0c5ee6781ec967cc6fb82, 0x00d50e18c9ab8aaef0c07439b0914afe67e14b82a9a389487aecb87baadfe6f9, 0xc982d1cde7d88a8f6582003570367b71c5154f72c636a5dcd9c79c7f6a7387c3, 0x308d9d7487205a40b311284e2210d40c9371251fbb861a942269421f7f120a6c, 0xd1a4ce9400cedc5d863f35ee93b157e1814a483df560ecbcbfea49b4552e082a, 0xa7cf8b795085c5433b61a5dd2f5cc5b915ebae4fae885db383c3163e4331e2c9, 0x150674df2e4f10abe209b6137d0022472912eaa5b28350cec238a4a446d274bd, 0xbcee3fb3b7cf30e81adaaf6bd00c59f619020a298a2390009d33efa71ecf4598, 0x59c3019c670a6a3d0a731ba113f9770a5b9286ecb524d6aceb339faf8a8b0abb, 0x611717f6cafcfbe8f45d09550e329608da3f98d38ab8e0086af813dce84c51d1, 0x5baff86a84dd57554fa738b6ce9fe8ab3bb3d34ca9cd620e41d21b2ebedbc817, 0x6277784a783ac88d313d8a263167b428d2b6937191b67043cf11e1f5a7d057bb, 0x4ac0478d7683d217ce943a3308e7d9a244d1ce33b864c4f1dad65296010c6546} } conn_eth_version=Eth68
2024-04-02T13:15:50.560940Z TRACE rpc::eth: Serving eth_call request=TransactionRequest { from: Some(0x0000000000000000000000000000000000000000), to: Some(0x8315177ab297ba92a06054ce80a67ed4dbd7ed3a), gas_price: None, max_fee_per_gas: None, max_priority_fee_per_gas: None, max_fee_per_blob_gas: None, gas: None, value: None, input: TransactionInput { input: Some(0xeca067ad), data: None }, nonce: None, chain_id: None, access_list: None, transaction_type: None, blob_versioned_hashes: None, sidecar: None, other: OtherFields { inner: {} } } block_number=Some(Number(Number(19568322))) state_overrides=None block_overrides=None
2024-04-02T13:15:53.975584Z WARN libmdbx: Long-lived read transaction has been timed out open_duration=300.000088517s
2024-04-02T13:15:57.967510Z WARN libmdbx: Long-lived read transaction has been timed out open_duration=300.000246831s
2024-04-02T14:58:36.375077Z INFO reth::cli: Starting reth version="0.2.0-beta.4 (c04dbe6e9)"
2024-04-02T14:58:36.375723Z INFO reth::cli: Opening database path="/data/db"
2024-04-02T14:58:47.051432Z INFO reth::cli: Configuration loaded path="/data/config.toml"
2024-04-02T14:58:47.165333Z INFO reth::cli: Database opened
2024-04-02T14:58:47.165370Z INFO reth::cli: Starting metrics endpoint addr=0.0.0.0:9001
2024-04-02T14:58:47.165484Z DEBUG reth::cli: Initializing genesis chain=mainnet genesis=0xd4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3
2024-04-02T14:58:47.171081Z DEBUG reth_node_core::init: Genesis already written, skipping.
2024-04-02T14:58:47.171140Z INFO reth::cli:
Pre-merge hard forks (block based):
- Frontier @0
- Homestead @1150000
- Dao @1920000
- Tangerine @2463000
- SpuriousDragon @2675000
- Byzantium @4370000
- Constantinople @7280000
- Petersburg @7280000
- Istanbul @9069000
- MuirGlacier @9200000
- Berlin @12244000
- London @12965000
- ArrowGlacier @13773000
- GrayGlacier @15050000
Merge hard forks:
- Paris @58750000000000000000000 (network is known to be merged)
Post-merge hard forks (timestamp based):
- Shanghai @1681338455
- Cancun @1710338135
2024-04-02T14:58:47.171162Z DEBUG reth::cli: Spawning stages metrics listener task
2024-04-02T14:58:47.182386Z WARN provider::static_file: Could not find block or tx number on a range request segment=Headers number=19568228
2024-04-02T14:58:47.182470Z ERROR reth::cli: shutting down due to error
Platform(s)
Linux (x86)
What version/commit are you on?
0.2.0-beta.4 (c04dbe6e9)"
What database version are you on?
Current database version: 2 Local database version: 2
What type of node are you running?
Archive (default)
What prune config do you use, if any?
N/a
If you've built Reth from source, provide the full command you used
No response
Code of Conduct
- [X] I agree to follow the Code of Conduct
Here are logs for a "WARN Could not find block ...". In that zip file is reth.log
and reth.log.starting-reth-version.could-not-find-block
which was created with:
kendall@ethel 24-04-09T18:38:53.440Z:~/eth2-data/reth/holesky
$ rg 'Starting reth version|Could not find block' reth.log > reth.log.starting-reth-version.could-not-find-block
kendall@ethel 24-04-09T18:39:20.716Z:~/eth2-data/reth/holesky
When you look at reth.log.starting-reth-version.could-not-find-block
the file has 3350 "Starting reth" from "2024-04-08T17:26:10.127770Z" .. "2024-04-09T00:42:05.917330Z". After "2024-04-09T00:42:05.917330Z" I stopped reth when I discovered that my node was rebooting because of the "Failed to insert block" issue that's being tracked in issue 7430. Eventually I dropped the merkle stage and restarted reth at "2024-04-09T03:47:22.649347Z".
Then today I thought I'd add a panic when an InsertBlockError occurs, I did that and accidentally started reth.service
which has unmodified version "0.2.0-beta.5 (55f75cdcc)". I stopped it and then started my modified version "0.2.0-beta.5-dev (cd1c2bf99)" and then we see the "Could not find block".
So I'm guessing the problem related to this issue occurred some short time before "2024-04-09T17:21:33.262346Z" (i.e. when I stopped reth.service) and "2024-04-09T17:24:14.711716Z " when the "Could not find block" happened.
2024-04-08T17:26:10.127770Z INFO reth::cli: Starting reth version="0.2.0-beta.5 (54f75cdcc)"
...
2024-04-09T00:42:05.917330Z INFO reth::cli: Starting reth version="0.2.0-beta.5 (54f75cdcc)"
2024-04-09T03:47:22.649347Z INFO reth::cli: Starting reth version="0.2.0-beta.5 (54f75cdcc)"
2024-04-09T17:21:33.262346Z INFO reth::cli: Starting reth version="0.2.0-beta.5 (54f75cdcc)"
2024-04-09T17:24:14.670824Z INFO reth::cli: Starting reth version="0.2.0-beta.5-dev (cd1c2bf99)"
2024-04-09T17:24:14.711716Z WARN provider::static_file: Could not find block or tx number on a range request segment=Headers number=1311784
Hopefully these logs and maybe my observations are helpful.
Here are logs for a "WARN Could not find block ...". In that zip file is
reth.log
andreth.log.starting-reth-version.could-not-find-block
which was created with:kendall@ethel 24-04-09T18:38:53.440Z:~/eth2-data/reth/holesky $ rg 'Starting reth version|Could not find block' reth.log > reth.log.starting-reth-version.could-not-find-block kendall@ethel 24-04-09T18:39:20.716Z:~/eth2-data/reth/holesky
When you look at
reth.log.starting-reth-version.could-not-find-block
the file has 3350 "Starting reth" from "2024-04-08T17:26:10.127770Z" .. "2024-04-09T00:42:05.917330Z". After "2024-04-09T00:42:05.917330Z" I stopped reth when I discovered that my node was rebooting because of the "Failed to insert block" issue that's being tracked in issue 7430. Eventually I dropped the merkle stage and restarted reth at "2024-04-09T03:47:22.649347Z".Then today I thought I'd add a panic when an InsertBlockError occurs, I did that and accidentally started
reth.service
which has unmodified version "0.2.0-beta.5 (55f75cdcc)". I stopped it and then started my modified version "0.2.0-beta.5-dev (cd1c2bf)" and then we see the "Could not find block".So I'm guessing the problem related to this issue occurred some short time before "2024-04-09T17:21:33.262346Z" (i.e. when I stopped reth.service) and "2024-04-09T17:24:14.711716Z " when the "Could not find block" happened.
2024-04-08T17:26:10.127770Z INFO reth::cli: Starting reth version="0.2.0-beta.5 (54f75cdcc)" ... 2024-04-09T00:42:05.917330Z INFO reth::cli: Starting reth version="0.2.0-beta.5 (54f75cdcc)" 2024-04-09T03:47:22.649347Z INFO reth::cli: Starting reth version="0.2.0-beta.5 (54f75cdcc)" 2024-04-09T17:21:33.262346Z INFO reth::cli: Starting reth version="0.2.0-beta.5 (54f75cdcc)" 2024-04-09T17:24:14.670824Z INFO reth::cli: Starting reth version="0.2.0-beta.5-dev (cd1c2bf99)" 2024-04-09T17:24:14.711716Z WARN provider::static_file: Could not find block or tx number on a range request segment=Headers number=1311784
Hopefully these logs and maybe my observations are helpful.
looking through the logs it seems like yours presented differently than mine, and considerably sooner, then again it seems you are also using holesky?
looking through the logs it seems like yours presented differently than mine, and considerably sooner, then again it seems you are also using holesky?
Yes I'm on holesky
This issue is stale because it has been open for 21 days with no activity.
This issue is still present in beta.6
. My node's logs
The RPC endpoint became unresponsive, so I restarted reth via systemd, and it was broken when it started back up.
What is the best way to go about fixing my node? Can I rewind or will I require a resync?
@vincenttaglia looking at your logs, could be that the prune checkpoint didn't save properly when process killed. would be good with trace
log level enabled for targets rpc
to try and get more info on what work the node is doing when rpc is unresponsive. other interesting targets would be consensus
and pruner
.
RUST_LOG=rpc,consensus,pruner=trace <your command to run the node>
also targets provider::static_file
and static_file
@vincenttaglia looking at your logs, could be that the prune checkpoint didn't save properly when process killed. would be good with
trace
log level enabled for targetsrpc
to try and get more info on what work the node is doing when rpc is unresponsive. other interesting targets would beconsensus
andpruner
.
RUST_LOG=rpc,consensus,pruner=trace <your command to run the node>
It happened again yesterday. I have trace logs on, but I'm not sure my logs caught anything from the period that it started to go unresponsive. Nevertheless here are my logs with rpc::eth:
lines removed to fit in a gist. I can also upload the full logs if that would help debug.
I am using reth for indexing so there are a lot of RPC calls, but I would expect the node to drop some of those queries before dropping the sync procedure.
https://gist.github.com/vincenttaglia/4e6d678563ebef07543d5e0bb2d6219d
same error... my server shut down unexpectly, when restart the RETH cannot restart with the error "ERROR shutting down due to error" Error: not able to find Headers static file for block number 19979823
@vincenttaglia looking at your logs, could be that the prune checkpoint didn't save properly when process killed. would be good with
trace
log level enabled for targetsrpc
to try and get more info on what work the node is doing when rpc is unresponsive. other interesting targets would beconsensus
andpruner
.
RUST_LOG=rpc,consensus,pruner=trace <your command to run the node>
how to fix my node,pls
last canonicalized head is logged as block 19917981
May 21 11:27:52 reth start-reth.sh[1615]: 2024-05-21T11:27:52.334848Z DEBUG consensus::engine: Canonicalized new head hash=0x2312348964cf8cc1a2f7cfbbab7ff8210c2dedcb0852cbd2d8a47e6504f6831a number=19917981
so I'm wondering why on startup, the node looks for block 19922274
May 23 15:26:50 reth start-reth.sh[59660]: Error: not able to find Headers static file for block number 19922274
possibly related to https://github.com/paradigmxyz/reth/issues/8203
could you try and repro again with trace enabled for targets provider::static_file
and static_file
as well, or just enabled for all targets, so we can get all the available context for the bug @vincenttaglia
This issue is stale because it has been open for 21 days with no activity.
This should theoretically not happen anymore . feel free to reopen if it still does in the latest version