op-node v1.13.3: thousands of stuck goroutines in p2p.SyncClient / libp2p
Bug Description
After several hours of uptime, op-node (v1.13.3) gets stuck with thousands of goroutines, mostly in select, IO wait, or yamux session handlers. Sync stops working properly. The node appears connected but stops progressing with peers. We suspect an issue with how libp2p/yamux sessions are handled or cleaned up.
Steps to Reproduce
- Start
op-nodev1.13.3 using the official imageus-docker.pkg.dev/oplabs-tools-artifacts/images/op-node:v1.13.3(based on Alpine Linux 3.20). - Let it run for 6–12 hours in a production setup with inbound/outbound peer traffic.
- Inspect goroutines using
kill -ABRT $(pidof op-node)orpprof. - Observe thousands of goroutines in stuck state, many coming from
libp2p,yamux, orp2p.(*SyncClient).peerLoop.
Expected behavior
Goroutines should terminate or be cleaned up if the stream or peer becomes inactive or broken. Instead, they accumulate indefinitely, consuming resources and blocking new peer interactions.
Environment Information:
- Operating System: Alpine Linux 3.20 (via Docker)
- Container image:
us-docker.pkg.dev/oplabs-tools-artifacts/images/op-node:v1.13.3 - Package Version: op-node v1.13.3
- go-libp2p: v0.36.2
- go-yamux: v4.0.1
- go-libp2p-pubsub: v0.12.0
- CPU: 32-core
- RAM: 128 GB
- Disk: NVMe SSD
- Load: <30% CPU, <40% RAM, disk idle
Configurations:
Environment variables and CLI:
OP_NODE__L2_ENGINE_AUTH_FILE=/jwtsecret
OP_NODE__L2_ENGINE_RPC=http://localhost:8551
OP_NODE__L1=<REDACTED>
OP_NODE__L1_BEACON=<REDACTED>
OP_NODE__RPC_ADDR=0.0.0.0
OP_NODE__RPC_PORT=8547
OP_NODE__METRICS_ENABLED=true
OP_NODE__P2P_ENABLED=true
OP_NODE__P2P_PRIV_PATH=/p2p-node-key.txt
OP_NODE__P2P_LISTEN_IP=0.0.0.0
OP_NODE__P2P_TCP_PORT=30303
OP_NODE__P2P_UDP_PORT=30303
Logs:
Truncated example of goroutines:
goroutine 3412893 [select]:
github.com/libp2p/go-yamux/v4.(*Stream).Read(0xc012cb2380, ...)
stream.go:111
...
github.com/libp2p/go-libp2p-pubsub.(*PubSub).handleNewStream
comm.go:66
goroutine 4125262 [select, 2612 minutes]:
github.com/ethereum-optimism/optimism/op-node/p2p.(*SyncClient).peerLoop
sync.go:589
goroutine 4646886 [select, 325 minutes]:
github.com/libp2p/go-yamux/v4.(*Stream).Read
stream.go:111
...
github.com/libp2p/go-libp2p-pubsub.(*PubSub).handlePeerDead
comm.go:150
Additional context
- Problem persists across restarts.
- System resources are not saturated.
- We suspect either
op-nodedoes not cancel dead peer sessions properly, orlibp2p/yamuxstreams are not cleaned up under some edge condition. - This causes sync issues and degraded networking performance over time.
Same here, also on op-node v1.13.3
goroutine 1357345 [select, 3406 minutes]: 11:12:30 [1527/1825]
github.com/libp2p/go-yamux/v4.(*Stream).Read(0xc00f4f9a40, {0xc0254d7e74, 0x1, 0x1})
/go/pkg/mod/github.com/libp2p/go-yamux/[email protected]/stream.go:111 +0x1a5
github.com/libp2p/go-libp2p/p2p/muxer/yamux.(*stream).Read(0x48b4ac?, {0xc0254d7e74?, 0x100c0028cae58?, 0xc0028cae60?})
/go/pkg/mod/github.com/libp2p/[email protected]/p2p/muxer/yamux/stream.go:17 +0x18
github.com/libp2p/go-libp2p/p2p/net/swarm.(*Stream).Read(0xc02498fa80, {0xc0254d7e74?, 0x1000000001c?, 0xc0028caf10?})
/go/pkg/mod/github.com/libp2p/[email protected]/p2p/net/swarm/swarm_stream.go:58 +0x2d
github.com/multiformats/go-multistream.(*lazyClientConn[...]).Read(0xc000100008?, {0xc0254d7e74?, 0x1?, 0x1?})
/go/pkg/mod/github.com/multiformats/[email protected]/lazyClient.go:68 +0x98
github.com/libp2p/go-libp2p/p2p/host/basic.(*streamWrapper).Read(0x222e700?, {0xc0254d7e74?, 0x0?, 0x0?})
/go/pkg/mod/github.com/libp2p/[email protected]/p2p/host/basic/basic_host.go:1108 +0x22
github.com/libp2p/go-libp2p-pubsub.(*PubSub).handlePeerDead(0xc00131d8c8, {0x225a8b0, 0xc02395c500})
/go/pkg/mod/github.com/libp2p/[email protected]/comm.go:150 +0x73
created by github.com/libp2p/go-libp2p-pubsub.(*PubSub).handleNewPeer in goroutine 1357295
goroutine 943461 [IO wait]: 11:12:30 [1585/1825]
internal/poll.runtime_pollWait(0x7bc6cccc8ad8, 0x72)
/usr/local/go/src/runtime/netpoll.go:351 +0x85
internal/poll.(*pollDesc).wait(0xc0183a3880?, 0xc019096000?, 0x0)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27
internal/poll.(*pollDesc).waitRead(...)
Same here, also on op-node v1.13.3, haven't tried v1.13.4 yet