reth
reth copied to clipboard
reth connection to lighthouse CL client eventually has timeouts
Describe the bug
For some reason everything running smoothly and I'm synced with reth EL and lighthouse CL clients, then after a while beaconchain gets out of sync and seems to have issues connecting to the reth execution client (see logs below)
Steps to reproduce
- build reth from source
- on Ubuntu 20.04 machine with 16GB RAM and 2TB SSD (a GCP VPS)
- run reth and CL client together (lighthouse v4.5.0 Little Dipper)
- wait (weirdly it worked for a while then starts to fail a while later)
Node logs
lighthouse[249164]: Jan 09 13:00:32.395 WARN Execution engine call failed error: HttpClient(url: http://127.0.0.1:8551/, kind: timeout, detail: operation timed out), service: exec
lighthouse[249164]: Jan 09 13:00:32.395 WARN Error whilst processing payload status error: Api { error: HttpClient(url: http://127.0.0.1:8551/, kind: timeout, detail: operation timed out) }, service: exec
lighthouse[249164]: Jan 09 13:00:32.395 CRIT Failed to update execution head error: ExecutionForkChoiceUpdateFailed(EngineError(Api { error: HttpClient(url: http://127.0.0.1:8551/, kind: timeout, detail: operation timed out) })), service: beacon
Platform(s)
No response
What version/commit are you on?
Version: 0.1.0-alpha.13 Commit SHA: a5a0fff5
What database version are you on?
Current database version: 1 Local database is uninitialized
What type of node are you running?
Full via --full flag
What prune config do you use, if any?
No response
If you've built Reth from source, provide the full command you used
cargo install --locked --path bin/reth --bin reth
Code of Conduct
- [X] I agree to follow the Code of Conduct
Seeing the same here on alpha 16
This issue is stale because it has been open for 21 days with no activity.
Seeing the same with Lighthouse 5.1.3 and Reth Beta 6.(built from source for native and using maxperf for Reth) , full sync, Ubuntu 22.04 16 cores/ 32gb ram, arm64.
Seeing the same with Lighthouse 5.1.3 and Reth Beta 7 (built from source for native and using maxperf for Reth), full sync, Fedora 40, 16 cores, 32gb ram, x86 (AMD 3950x).
Tested May12 (c2a05f07d and May15 (aefcfff25) builds.
Restarting reth and lighthouse causes them to sync closer to the head, but then gets timeouts a few blocks from the head (20-50).
No Reth errors running with:
logging_level="info,net=error,reth::node::events=error,consensus::engine=debug"
Lighthouse errors:
May 15 19:04:58.474 CRIT Failed to update execution head error: ExecutionForkChoiceUpdateFailed(EngineError(Api { error: HttpClient(url: http://127.0.0.1:8551/, kind: timeout, detail: operation timed out) })), service: beacon
EDIT1: looking at detailed lighthouse logs, I see crashes due to it running out of space on my ~/.cargo drive.
EDIT2: still happening after making space.
Are you using the default Fedora file system options (btrfs, LUKS encryption)? I was having very similar issues in Fedora 39 on robust hardware. After moving the Reth data volume to an unencrypted ext3 file system, timeouts are no longer an issue.
Are you using the default Fedora file system options (btrfs, LUKS encryption)? I was having very similar issues in Fedora 39 on robust hardware. After moving the Reth data volume to an unencrypted ext3 file system, timeouts are no longer an issue.
I've been running reth successfully on this hardware since alpha1, including the last several weeks on Fedora40. Recent changes are updating Lighthouse and running Erigon separately on the same machine (in case it's interfering). Will investigate further.
Related #7322
I'm marking this as resolved because with 1.1.0 the default engine handler no longer suffers from this design flaw