P2P DNS/TCP Burst Issue on Mainnet and Goerli with Docker
Report from Discord user KuDeTa: https://discord.com/channels/905194001349627914/1014169639258964081/1016969914805927987
Neither Goerli nor mainnet will sync reliably for me Geth, erigon, prysm, teku all running perfectly well in the same stack.
If i delete the whole database, it does OK for a little while then drops I see these DNS resolver messages sometimes too:
2022-09-07 07:24:36.608+00:00 |
Timer-0 | WARN | DNSResolver | I/O exception contacting remote DNS server when resolving OSBNAM4I3ZWCOLC4QLPNVYK4C4.all.mainnet.ethdisco.net
java.io.IOException: Timed out while trying to resolve OSBNAM4I3ZWCOLC4QLPNVYK4C4.all.mainnet.ethdisco.net./TXT, id=4930
at org.xbill.DNS.Resolver.send(Resolver.java:170)
I saw someone else post that their pi hole (DNS server) was seeing thousands of DNS requests a second, mine is too. I was kind of wondering if there is an issue with docker and the volume of DNS traffic. I haven't checked exactly what is going on with my router (Unifi gear), but this looks like it is actually ddosing my network stack All i can say right now is that every time i start this, it seems to break my DNS server (pi hole) and bring my router to it's knees.
...and it's only besu that has DNS bursts like this?
It's certainly only besu that seems to burst so hard and then complain it can't find peers But my pi.hole is set to allow 2000/min and even that is being hit. And my router is a UXG-PRO - prosumer (commercial) grade. But clearly this isn't a widespread issue, so could docker be the real problem somewhere here? I will try that Xdns stuff and try > bypassing the local dns server to see if either improve it
Neither Xdns or getting rid of the local DNS server helped. Router continues to buckle under the weight of traffic as soon as the service is started - it needs some proper investigation.
Sep 7 11:32:14 UXGPro user.info ubios-udapi-server: wan-failover-interfaces: wf-interface-ppp0 (my ip
) is down [_DD___](no dns)
It looks like that TCP_tw traffic is the issue I'm at the limit of my understand right there, but no other node software causes that kind of burst
Total: 355
TCP: 1264 (estab 6, closed 1221, orphaned 18, timewait 599)
Transport Total IP IPv6
RAW 1 0 1
UDP 35 28 7
TCP 43 31 12
INET 79 59 20
FRAG 0 0 0
(on starting besu)
Some debug logs attached
A lot of these logs which may be related?
2022-09-08 16:17:29.203+00:00 | nioEventLoopGroup-3-1 | DEBUG | AbstractPeerConnection | Terminating connection 1376930571, reason 0x01 TCP_SUBSYSTEM_ERROR
2022-09-08 16:17:29.203+00:00 | nioEventLoopGroup-3-1 | DEBUG | AbstractPeerConnection | Terminating connection 1376930571, reason 0x10 SUBPROTOCOL_TRIGGERED
and
2022-09-08 16:19:50.730+00:00 | nioEventLoopGroup-3-3 | DEBUG | AbstractHandshakeHandler | Handshake error:
java.io.IOException: Connection reset by peer
at java.base/sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at java.base/sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at java.base/sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:276)
at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:233)
at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:223)
at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:356)
at io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:258)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1132)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:357)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:151)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
** UPGRADED TO 22.7.2 **
OK, testing this identical config on goerli, using a completely separate box, but also using docker within the same networking stack: results are identical to mainnet: Massive TCP spike. Router keels over, no peers, etc.
besugoerli.txt (besu mainnet log file size too big for github, see: https://discord.com/channels/905194001349627914/1014169639258964081/1017548263852888195)
And if you want some flavour of what i go through every time i start besu, here is a good screenshot :S
Is your router acting as your DNS provider? Maybe you could change that to something else if the router is struggling?
I've tried turning off my pi.hole, it made no difference Whatever is going on here, it seems like it should be escalated as much as you can - i've never come across a piece of software that can bork an entire network. I don't think i can blame the UXG-PRO (https://store.ui.com/products/unifi-next-generation-gateway-professional) - it should be capable of handling hundreds of concurrent users and many servers - but i'm happy to try and help isolate the cause of this. I guess docker is the prime suspect atm.
did you try lowering the max-peers?
Yeah i went back down to defaults at various points.
Hardware
hexacore (12 thread) NUC (i7-FNH):
> No noticeable blips from node exporter at any of the times i've started besu
Docker configuration
mainnet
services:
el-besu:
stdin_open: true
tty: true
container_name: el-besu
image: hyperledger/besu:latest
volumes:
- /archive/el-besu:/var/lib/besu
- /home/ethereum/secrets:/secrets
restart: unless-stopped
ports: # add 8545:8545 for RPC
- "50303:50303"
- "50303:50303/udp"
networks:
- ethereum
command: >
--data-path=/var/lib/besu
--rpc-http-enabled
--rpc-http-api="WEB3,ETH,NET"
--rpc-http-host="0.0.0.0"
--rpc-http-port=8545
--rpc-http-cors-origins=*
--rpc-http-max-active-connections=65536
--rpc-ws-enabled
--rpc-ws-api="WEB3,ETH,NET,ADMIN"
--rpc-ws-host="0.0.0.0"
--rpc-ws-port=8546
--p2p-port=50303
--max-peers=40
--fast-sync-min-peers=5
--host-allowlist=*
--engine-host-allowlist=*
--engine-jwt-secret=/secrets/jwt
--engine-rpc-port=8551
--data-storage-format=BONSAI
--sync-mode=X_SNAP
--nat-method=DOCKER
--p2p-host=<myip>
--p2p-interface=0.0.0.0
stop_grace_period: 10m
#testnet config
version: "3.5"
services:
el-besu:
stdin_open: true
tty: true
container_name: el-besu
image: hyperledger/besu:latest
volumes:
- /home/el-besu:/var/lib/besu
- /home/ethereum/secrets:/secrets
restart: unless-stopped
ports: # add 8545:8545 for RPC
- "50304:50304"
- "50304:50304/udp"
networks:
- ethereum
command: >
--network=goerli
--data-path=/var/lib/besu
--rpc-http-enabled
--rpc-http-api="WEB3,ETH,NET,ADMIN"
--rpc-http-host="0.0.0.0"
--rpc-http-port=8545
--rpc-http-cors-origins=*
--rpc-ws-enabled
--rpc-ws-api="WEB3,ETH,NET,ADMIN"
--rpc-ws-host="0.0.0.0"
--rpc-ws-port=8546
--p2p-port=50304
--max-peers=40
--fast-sync-min-peers=5
--host-allowlist=*
--engine-host-allowlist=*
--engine-jwt-secret=/secrets/jwt
--engine-rpc-port=8551
--data-storage-format=BONSAI
--sync-mode=X_SNAP
--logging=DEBUG
--nat-method=DOCKER
--p2p-host=<ip>
--p2p-interface=0.0.0.0
--Xdns-enabled=true
stop_grace_period: 10m
networks:
ethereum:
name: ethereum
driver: bridge
Any status?
