reth
reth copied to clipboard
Metrics enpoint blocks if reth gets stuck
Describe the bug
I was syncing reth. The service ran just fine, but somehow got "stuck".
Indication was that the log said "Checkpoint=0".
When reth was in that state - The "/metrics" endpoint was blocking. Unusable.
I would expect the metrics endpoint to "work" (is it running in the same thread as the main service?)
This is a conversation from telegram on the topic:
So, for a node operator, such as us at https://dwellir.com - to be able to monitor for instances finding themselves as "stuck" is monumental.
Would it make sense to break out the thread that manages "/metrics" such as we could discover situations where the reth process "works" but effectively is "stuck" ? This is pretty key for long term operations.
A restart of reth a few times seems to get the node back into business/syncing. But it required babysitting the sync process which isn't possible in a larger scale operations.
Steps to reproduce
- Install reth 1.0.3
- Install lighthouse
- Connect the instances.
- Start sync.
- Wait to get stuck (Checkpoint=0)
- Try calling the /metrics endpoint with curl.
- The call blocks. (Error reproduced)
Node logs
Possibly, this is where something goes sideways as I can see the checkpoint=0 coming in here.
024-07-31T17:27:41.068486Z INFO reth_node_events::node: Preparing stage pipeline_stages=2/12 stage=Bodies checkpoint=20425000 target=20425694
2024-07-31T17:27:41.068545Z INFO reth_node_events::node: Executing stage pipeline_stages=2/12 stage=Bodies checkpoint=20425000 target=20425694
2024-07-31T17:27:41.419545Z INFO reth_node_events::node: Finished stage pipeline_stages=2/12 stage=Bodies checkpoint=20425694 target=20425694 stage_progress=100.00%
2024-07-31T17:27:41.434439Z DEBUG provider::static_file: Commit segment=Transactions path="/root/snap/reth/15/.local/share/reth/mainnet/static_files/static_file_transactions_20000000_20499999" duration=14.954467ms
2024-07-31T17:27:41.435961Z DEBUG storage::db::mdbx: Commit total_duration=1.408035ms commit_latency=Some(CommitLatency(MDBX_commit_latency { preparation: 0, gc_wallclock: 2, audit: 0, write: 0, sync: 87, ending: 0, whole: 90,
gc_cputime: 0, gc_prof: MDBX_commit_latency__bindgen_ty_1 { wloops: 0, coalescences: 0, wipes: 0, flushes: 0, kicks: 0, work_counter: 0, work_rtime_monotonic: 0, work_xtime_cpu: 0, work_rsteps: 0, work_xpages: 0, work_majflt:
0, self_counter: 0, self_rtime_monotonic: 0, self_xtime_cpu: 0, self_rsteps: 0, self_xpages: 0, self_majflt: 0 } })) is_read_only=false
2024-07-31T17:27:41.436044Z INFO reth_node_events::node: Preparing stage pipeline_stages=3/12 stage=SenderRecovery checkpoint=0 target=20425694
2024-07-31T17:27:41.436055Z INFO reth_node_events::node: Executing stage pipeline_stages=3/12 stage=SenderRecovery checkpoint=0 target=20425694
2024-07-31T17:27:41.437466Z INFO sync::stages::sender_recovery: Recovering senders tx_range=0..5000003
2024-07-31T17:27:41.437470Z DEBUG sync::stages::sender_recovery: Recovering senders batch tx_range=0..100000
2024-07-31T17:27:41.437920Z DEBUG sync::stages::sender_recovery: Appending recovered senders to the database tx_range=0..100000
2024-07-31T17:27:42.199709Z DEBUG sync::stages::sender_recovery: Finished recovering senders batch tx_range=0..100000
2024-07-31T17:27:42.199724Z DEBUG sync::stages::sender_recovery: Recovering senders batch tx_range=100000..200000
2024-07-31T17:27:42.199975Z DEBUG sync::stages::sender_recovery: Appending recovered senders to the database tx_range=100000..200000
2024-07-31T17:27:42.961397Z DEBUG sync::stages::sender_recovery: Finished recovering senders batch tx_range=100000..200000
2024-07-31T17:27:42.961412Z DEBUG sync::stages::sender_recovery: Recovering senders batch tx_range=200000..300000
2024-07-31T17:27:42.961557Z DEBUG sync::stages::sender_recovery: Appending recovered senders to the database tx_range=200000..300000
2024-07-31T17:27:43.724572Z DEBUG sync::stages::sender_recovery: Finished recovering senders batch tx_range=200000..300000
2024-07-31T17:27:43.724587Z DEBUG sync::stages::sender_recovery: Recovering senders batch tx_range=300000..400000
2024-07-31T17:27:43.724714Z DEBUG sync::stages::sender_recovery: Appending recovered senders to the database tx_range=300000..400000
2024-07-31T17:27:43.757039Z INFO reth::cli: Status connected_peers=74 freelist=15 stage=SenderRecovery checkpoint=0 target=20425694
2024-07-31T17:27:44.470751Z DEBUG sync::stages::sender_recovery: Finished recovering senders batch tx_range=300000..400000
2024-07-31T17:27:44.470767Z DEBUG sync::stages::sender_recovery: Recovering senders batch tx_range=400000..500000
2024-07-31T17:27:44.470927Z DEBUG sync::stages::sender_recovery: Appending recovered senders to the database tx_range=400000..500000
2024-07-31T17:27:45.213033Z DEBUG sync::stages::sender_recovery: Finished recovering senders batch tx_range=400000..500000
2024-07-31T17:27:45.213048Z DEBUG sync::stages::sender_recovery: Recovering senders batch tx_range=500000..600000
2024-07-31T17:27:45.213171Z DEBUG sync::stages::sender_recovery: Appending recovered senders to the database tx_range=500000..600000
2024-07-31T17:27:45.955626Z DEBUG sync::stages::sender_recovery: Finished recovering senders batch tx_range=500000..600000
... and a bit later:
2024-07-31T22:39:30.580987Z DEBUG sync::stages::sender_recovery: Finished recovering senders batch tx_range=2458948819..2458980010
2024-07-31T22:39:30.581143Z INFO reth_node_events::node: Finished stage pipeline_stages=3/12 stage=SenderRecovery checkpoint=20425694 target=20425694 stage_progress=100.00% stage_eta=unknown
2024-07-31T22:39:30.741552Z DEBUG storage::db::mdbx: Commit total_duration=160.453687ms commit_latency=Some(CommitLatency(MDBX_commit_latency { preparation: 0, gc_wallclock: 2, audit: 0, write: 0, sync: 10496, ending: 0, whole
: 10498, gc_cputime: 0, gc_prof: MDBX_commit_latency__bindgen_ty_1 { wloops: 0, coalescences: 0, wipes: 0, flushes: 0, kicks: 0, work_counter: 0, work_rtime_monotonic: 0, work_xtime_cpu: 0, work_rsteps: 0, work_xpages: 0, work
_majflt: 0, self_counter: 0, self_rtime_monotonic: 0, self_xtime_cpu: 0, self_rsteps: 0, self_xpages: 0, self_majflt: 0 } })) is_read_only=false
2024-07-31T22:39:30.741675Z INFO reth_node_events::node: Preparing stage pipeline_stages=4/12 stage=Execution checkpoint=0 target=20425694
2024-07-31T22:39:30.741931Z INFO reth_node_events::node: Executing stage pipeline_stages=4/12 stage=Execution checkpoint=0 target=20425694
2024-07-31T22:39:30.747958Z DEBUG sync::stages::execution: Preparing static file producer start=1
2024-07-31T22:39:30.762505Z DEBUG provider::static_file: Commit segment=Receipts path="/root/snap/reth/15/.local/share/reth/mainnet/static_files/static_file_receipts_0_499999" duration=966.595µs
2024-07-31T22:39:30.765979Z DEBUG sync::stages::execution: Calculating gas used from headers range=0..=0
2024-07-31T22:39:30.769162Z DEBUG sync::stages::execution: Finished calculating gas used from headers range=0..=0 duration=14.802µs
2024-07-31T22:39:30.769166Z DEBUG sync::stages::execution: Calculating gas used from headers range=1..=20425694
2024-07-31T22:39:37.324614Z DEBUG consensus::engine::hooks: Next hook is not ready hook="StaticFile"
2024-07-31T22:39:48.314808Z DEBUG consensus::engine::hooks: Next hook is not ready hook="StaticFile"
2024-07-31T22:39:48.819466Z INFO reth::cli: Status connected_peers=100 freelist=8 stage=Execution checkpoint=0 target=20425694
2024-07-31T22:40:01.092377Z DEBUG consensus::engine::hooks: Next hook is not ready hook="StaticFile"
2024-07-31T22:40:12.617997Z DEBUG consensus::engine::hooks: Next hook is not ready hook="StaticFile"
2024-07-31T22:40:13.819023Z INFO reth::cli: Status connected_peers=100 freelist=8 stage=Execution checkpoint=0 target=20425694
2024-07-31T22:40:22.521493Z DEBUG consensus::engine::hooks: Next hook is not ready hook="StaticFile"
2024-07-31T22:40:22.521548Z INFO reth_node_events::node: Received forkchoice updated message when syncing head_block_hash=0x0a50b9d682a73afc17c2f7f2d3059bbb5dd5c8d94efa133955b366dd9a29a844 safe_block_hash=0x4028230abf10b1e44d628a2120a2240f60315d3cdc3d7b9e1e31f1a5934b8da6 finalized_block_hash=0x44f38f31819708389bf4455f6c8e5433e093cadb8d2bbaf86024c9eff0f9f7a3
Platform(s)
Linux (x86)
What version/commit are you on?
reth Version: 1.0.3 Commit SHA: 390f30aadebcdd509e72cc04327c3b854de076a6 Build Timestamp: 2024-07-18T22:01:22.170093482Z Build Features: jemalloc Build Profile: maxperf
What database version are you on?
reth db version 2024-08-01T11:12:59.670173Z INFO Initialized tracing, debug log directory: /root/snap/reth/15/.cache/reth/logs/mainnet Current database version: 2 Local database version: 2
Which chain / network are you on?
mainnet
What type of node are you running?
Archive (default)
What prune config do you use, if any?
Cant find that section. [prune]
If you've built Reth from source, provide the full command you used
CARGO_TARGET_DIR=target cargo build --bin reth --profile maxperf --features jemalloc
Code of Conduct
- [X] I agree to follow the Code of Conduct