reth icon indicating copy to clipboard operation
reth copied to clipboard

reth connection to lighthouse CL client eventually has timeouts

Open kassandraoftroy opened this issue 1 year ago • 6 comments

Describe the bug

For some reason everything running smoothly and I'm synced with reth EL and lighthouse CL clients, then after a while beaconchain gets out of sync and seems to have issues connecting to the reth execution client (see logs below)

Steps to reproduce

  • build reth from source
  • on Ubuntu 20.04 machine with 16GB RAM and 2TB SSD (a GCP VPS)
  • run reth and CL client together (lighthouse v4.5.0 Little Dipper)
  • wait (weirdly it worked for a while then starts to fail a while later)

Node logs

lighthouse[249164]: Jan 09 13:00:32.395 WARN Execution engine call failed            error: HttpClient(url: http://127.0.0.1:8551/, kind: timeout, detail: operation timed out), service: exec
lighthouse[249164]: Jan 09 13:00:32.395 WARN Error whilst processing payload status  error: Api { error: HttpClient(url: http://127.0.0.1:8551/, kind: timeout, detail: operation timed out) }, service: exec
lighthouse[249164]: Jan 09 13:00:32.395 CRIT Failed to update execution head         error: ExecutionForkChoiceUpdateFailed(EngineError(Api { error: HttpClient(url: http://127.0.0.1:8551/, kind: timeout, detail: operation timed out) })), service: beacon

Platform(s)

No response

What version/commit are you on?

Version: 0.1.0-alpha.13 Commit SHA: a5a0fff5

What database version are you on?

Current database version: 1 Local database is uninitialized

What type of node are you running?

Full via --full flag

What prune config do you use, if any?

No response

If you've built Reth from source, provide the full command you used

cargo install --locked --path bin/reth --bin reth

Code of Conduct

  • [X] I agree to follow the Code of Conduct

kassandraoftroy avatar Jan 09 '24 14:01 kassandraoftroy

Seeing the same here on alpha 16

0xAlcibiades avatar Jan 29 '24 03:01 0xAlcibiades

This issue is stale because it has been open for 21 days with no activity.

github-actions[bot] avatar Feb 20 '24 01:02 github-actions[bot]

Seeing the same with Lighthouse 5.1.3 and Reth Beta 6.(built from source for native and using maxperf for Reth) , full sync, Ubuntu 22.04 16 cores/ 32gb ram, arm64.

8times4 avatar Apr 28 '24 10:04 8times4

Seeing the same with Lighthouse 5.1.3 and Reth Beta 7 (built from source for native and using maxperf for Reth), full sync, Fedora 40, 16 cores, 32gb ram, x86 (AMD 3950x).

Tested May12 (c2a05f07d and May15 (aefcfff25) builds.

Restarting reth and lighthouse causes them to sync closer to the head, but then gets timeouts a few blocks from the head (20-50).

No Reth errors running with:

logging_level="info,net=error,reth::node::events=error,consensus::engine=debug"

Lighthouse errors:

May 15 19:04:58.474 CRIT Failed to update execution head         error: ExecutionForkChoiceUpdateFailed(EngineError(Api { error: HttpClient(url: http://127.0.0.1:8551/, kind: timeout, detail: operation timed out) })), service: beacon

EDIT1: looking at detailed lighthouse logs, I see crashes due to it running out of space on my ~/.cargo drive. EDIT2: still happening after making space.

wakamex avatar May 15 '24 19:05 wakamex

Are you using the default Fedora file system options (btrfs, LUKS encryption)? I was having very similar issues in Fedora 39 on robust hardware. After moving the Reth data volume to an unencrypted ext3 file system, timeouts are no longer an issue.

BowTiedDevil avatar May 15 '24 22:05 BowTiedDevil

Are you using the default Fedora file system options (btrfs, LUKS encryption)? I was having very similar issues in Fedora 39 on robust hardware. After moving the Reth data volume to an unencrypted ext3 file system, timeouts are no longer an issue.

I've been running reth successfully on this hardware since alpha1, including the last several weeks on Fedora40. Recent changes are updating Lighthouse and running Erigon separately on the same machine (in case it's interfering). Will investigate further.

wakamex avatar May 15 '24 23:05 wakamex

Related #7322

quickchase avatar May 30 '24 01:05 quickchase

I'm marking this as resolved because with 1.1.0 the default engine handler no longer suffers from this design flaw

mattsse avatar Oct 16 '24 11:10 mattsse