besu
besu copied to clipboard
Analyse Inbound Disconnect Reasons Per Client
Related to #6805
We log enode with disconnect reason and client details with enode. Cross reference these logs and see if there's a pattern
we get Inbound UNKNOWN disconnects from besu and OpenEthereum eg OpenEthereum/v3.3.5-hbbft-0.9.3-unstable-6b34c53cb-20231028/x86_64-linux-gnu/rustc1.68.2
running a modified version of #6947 on a holesky node sample of disconnects (it's been running for under an hr) excluding MISMATCHED_NETWORK_ID disconnects since on holesky that's the vast majority
note this holesky node only has 2 peers (the 2 bootnodes)
sallymacfarlane@dev-elc-bu-tk-holesky-sally-6609-90:~$ cat /var/log/besu/besu.log | grep -v MISMATCHED_NET | grep bound | awk '{print $7,$9,$10,$13}' | sort
Inbound 0x UNKNOWN besu/v24.4-develop-9ae52a9/linux-x86_64/openjdk-java-21
Inbound 0x04 TOO_MANY_PEERS Nethermind/v1.25.4+20b10b35/linux-x64/dotnet8.0.2
Inbound 0x04 TOO_MANY_PEERS Nethermind/v1.25.4+20b10b35/linux-x64/dotnet8.0.2
Inbound 0x04 TOO_MANY_PEERS Nethermind/v1.25.4+20b10b35/linux-x64/dotnet8.0.2
Inbound 0x04 TOO_MANY_PEERS erigon/v2.58.0-9731f716/linux-amd64/go1.21.5
Inbound 0x04 TOO_MANY_PEERS erigon/v2.58.1/linux-amd64/go1.21.6
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Outbound 0x01 TCP_SUBSYSTEM_ERROR bor/v1.2.7/linux-amd64/go1.20.14
Outbound 0x01 TCP_SUBSYSTEM_ERROR bor/v1.2.7/linux-amd64/go1.20.14
Outbound 0x01 TCP_SUBSYSTEM_ERROR bor/v1.2.7/linux-amd64/go1.20.14
Outbound 0x01 TCP_SUBSYSTEM_ERROR bor/v1.2.7/linux-amd64/go1.20.14
Outbound 0x01 TCP_SUBSYSTEM_ERROR bor/v1.2.7/linux-amd64/go1.20.14
Outbound 0x01 TCP_SUBSYSTEM_ERROR bor/v1.2.7/linux-amd64/go1.20.14
Outbound 0x01 TCP_SUBSYSTEM_ERROR bor/v1.2.7/linux-amd64/go1.20.14
Outbound 0x01 TCP_SUBSYSTEM_ERROR bor/v1.2.7/linux-amd64/go1.20.14
Outbound 0x01 TCP_SUBSYSTEM_ERROR bor/v1.2.7/linux-amd64/go1.20.14
Outbound 0x01 TCP_SUBSYSTEM_ERROR bor/v1.2.7/linux-amd64/go1.20.14
Outbound 0x01 TCP_SUBSYSTEM_ERROR bor/v1.2.7/linux-amd64/go1.20.14
Outbound 0x01 TCP_SUBSYSTEM_ERROR bor/v1.2.7/linux-amd64/go1.20.14
updated #6947 to make the analysis easier ^^
disconnect reasons matched with client type - this is all disconnects (excluding mismatched network ID) on 6609-9x nodes (3 nodes) over about an hour https://docs.google.com/spreadsheets/d/1Bbu7YWwuwRJJ3KByfpnZsw_S1GNo4O9itM7a14GhACo/edit#gid=1710195989 this may prompt more questions than it answers - what else do we want to know? interesting that besu sends UNKNOWN disconnects more than any other peer - it's because of some code in RlpxAgent where we're checking "should we connect" from a number of sources (callbacks!) but I think it either means too many peers or node isn't ready, or already connected (it's kind of annoying that the real reason is hidden and probably is not a huge refactor but I didn't make a ticket yet)
Looking at 6609-90 which did not gain any peers via discovery over the last 9 hours
- all inbound and outbound disconnects where client info is visible (since we disconnected the 2 geth bootnodes with useless responses) are mostly bor, a couple of Nethermind and erigon (inbound) and OpenEthereum (outbound) - but plenty of other connection attempts that time out before it gets to that point
- literally no Geth in the logs after we disconnect the bootnodes
Inbound
{"@timestamp":"2024-04-16T07:19:19,451","level":"INFO","thread":"nioEventLoopGroup-3-9","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Inbound - 0x10 SUBPROTOCOL_TRIGGERED - 0xd9daa011e1f46a4c... bor/v1.2.7/linux-amd64/go1.20.14 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T07:19:35,559","level":"INFO","thread":"nioEventLoopGroup-3-1","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Inbound - 0x10 SUBPROTOCOL_TRIGGERED - 0x03124cf39fc39a1c... bor/v1.2.7/linux-amd64/go1.20.14 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T07:19:54,968","level":"INFO","thread":"nioEventLoopGroup-3-4","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Inbound - 0x10 SUBPROTOCOL_TRIGGERED - 0x8e39133d460b26b5... bor/v1.2.7/linux-amd64/go1.20.14 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T07:20:20,211","level":"INFO","thread":"nioEventLoopGroup-3-3","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Inbound - 0x10 SUBPROTOCOL_TRIGGERED - 0xd53a11a901865a4f... bor/v1.2.7/linux-amd64/go1.20.14 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T07:20:25,312","level":"INFO","thread":"nioEventLoopGroup-3-1","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Inbound - 0x10 SUBPROTOCOL_TRIGGERED - 0x74cc266ba22852bd... bor/v1.2.7/linux-amd64/go1.20.14 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T07:20:26,486","level":"INFO","thread":"nioEventLoopGroup-3-3","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Inbound - 0x10 SUBPROTOCOL_TRIGGERED - 0x1bd8a0c41ab6af59... bor/v1.2.7/linux-amd64/go1.20.14 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T07:20:58,096","level":"INFO","thread":"nioEventLoopGroup-3-8","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Inbound - 0x10 SUBPROTOCOL_TRIGGERED - 0x9e6e852e5eee598c... bor/v1.2.7/linux-amd64/go1.20.14 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T07:21:03,854","level":"INFO","thread":"nioEventLoopGroup-3-8","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Inbound - 0x10 SUBPROTOCOL_TRIGGERED - 0xfdc3755978ddbfc4... bor/v1.2.7/linux-amd64/go1.20.14 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T07:21:19,252","level":"INFO","thread":"nioEventLoopGroup-3-3","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Inbound - 0x10 SUBPROTOCOL_TRIGGERED - 0x288ed364dc9aa0ee... bor/v1.2.7/linux-amd64/go1.20.14 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T07:22:06,769","level":"INFO","thread":"nioEventLoopGroup-3-9","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Inbound - 0x10 SUBPROTOCOL_TRIGGERED - 0xa023f72df1fd1661... bor/v1.2.7/linux-amd64/go1.20.14 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T07:28:01,295","level":"INFO","thread":"nioEventLoopGroup-3-8","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Inbound - 0x04 TOO_MANY_PEERS - 0xf64105f40f5533e4... erigon/v2.58.1/linux-amd64/go1.21.6 - 2 peers left","throwable":""}
Outbound
{"@timestamp":"2024-04-16T14:13:33,406","level":"INFO","thread":"nioEventLoopGroup-3-1","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0x80ce9c14482cc882... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:14:06,808","level":"INFO","thread":"nioEventLoopGroup-3-5","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0xe20ff77a66de45d2... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:19:01,066","level":"INFO","thread":"nioEventLoopGroup-3-7","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0x6a453b0d5fda056a... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:20:31,079","level":"INFO","thread":"nioEventLoopGroup-3-6","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0xa8eb15f64140580c... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:25:12,120","level":"INFO","thread":"nioEventLoopGroup-3-4","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0x4ac190b9050f56b3... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:25:56,615","level":"INFO","thread":"nioEventLoopGroup-3-3","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0xabde70c2be3120ed... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:31:59,692","level":"INFO","thread":"nioEventLoopGroup-3-6","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0xbe6d6cc9093ea947... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:33:39,664","level":"INFO","thread":"nioEventLoopGroup-3-5","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0x2e8ee850bd0ba49a... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:38:24,388","level":"INFO","thread":"nioEventLoopGroup-3-5","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0xc99cfe0d5d31d492... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:40:53,038","level":"INFO","thread":"nioEventLoopGroup-3-7","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0x3a67985c41ba86f1... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:41:24,548","level":"INFO","thread":"nioEventLoopGroup-3-2","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0xdfd74a6cad8d2a78... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:46:54,078","level":"INFO","thread":"nioEventLoopGroup-3-1","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0x7acebc19661a700c... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:47:30,784","level":"INFO","thread":"nioEventLoopGroup-3-4","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0xa54c46e100cdfbf7... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:50:04,999","level":"INFO","thread":"nioEventLoopGroup-3-5","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0xcd5ccc6392bce530... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:50:14,074","level":"INFO","thread":"nioEventLoopGroup-3-8","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0x9632ba25514af69e... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:51:35,766","level":"INFO","thread":"nioEventLoopGroup-3-7","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0x885d0f84f53d7639... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:53:33,645","level":"INFO","thread":"nioEventLoopGroup-3-4","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0x8fdfb020b8f6a4c1... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:58:31,269","level":"INFO","thread":"nioEventLoopGroup-3-1","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0x36f89cd4e8e22776... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T15:02:31,288","level":"INFO","thread":"nioEventLoopGroup-3-4","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0x0b32358d064c5aa2... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T15:11:50,271","level":"INFO","thread":"nioEventLoopGroup-3-6","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0xd700e8ab20a92f3a... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T15:17:46,784","level":"INFO","thread":"nioEventLoopGroup-3-4","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0x008ed9ab41b53538... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T15:19:36,294","level":"INFO","thread":"nioEventLoopGroup-3-5","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x10 SUBPROTOCOL_TRIGGERED_MISMATCHED_NETWORK - 0x92f073faa1b7c747... OpenEthereum/v3.3.3-stable/x86_64-linux-musl/rustc1.47.0 - 0 peers left","throwable":""}
some interesting stats (again off 6609-90 which seems to be stuck with zero peers)
sallymacfarlane@dev-elc-bu-tk-holesky-sally-6609-90:~$ grep -c "Timed out" /var/log/besu/besu.log
27883
sallymacfarlane@dev-elc-bu-tk-holesky-sally-6609-90:~$ grep -c "Failed to connect to peer" /var/log/besu/besu.log
86300
sallymacfarlane@dev-elc-bu-tk-holesky-sally-6609-90:~$ grep -c "Received PONG packet" /var/log/besu/besu.log
91040
sallymacfarlane@dev-elc-bu-tk-holesky-sally-6609-90:~$ grep -c "New PeerConn" /var/log/besu/besu.log
264
sallymacfarlane@dev-elc-bu-tk-holesky-sally-6609-90:~$ grep Geth /var/log/besu/besu.log
{"@timestamp":"2024-04-16T08:39:34,743","level":"INFO","thread":"nioEventLoopGroup-3-2","class":"EthProtocolManager","message":"Disconnect - active Connection? true - Outbound - 0x03 USELESS_PEER_USELESS_RESPONSES - 0xa3435a0155a3e837... Geth/v1.13.14-stable-2bd6bd01/linux-amd64/go1.21.7 - 1 peers left","throwable":""}
{"@timestamp":"2024-04-16T08:39:35,817","level":"INFO","thread":"nioEventLoopGroup-3-1","class":"EthProtocolManager","message":"Disconnect - active Connection? true - Outbound - 0x03 USELESS_PEER_USELESS_RESPONSES - 0xac906289e4b7f12d... Geth/v1.13.14-stable-2bd6bd01/linux-amd64/go1.21.7 - 0 peers left","throwable":""}
sallymacfarlane@dev-elc-bu-tk-holesky-sally-6609-90:~$
Fail to connect reasons 87432
- time out
- socket exception - connection closed
- potential peer has too many peers
- connection refused
sallymacfarlane@dev-elc-bu-tk-holesky-sally-6609-90:~$ grep -c "Failed to connect to peer" /var/log/besu/besu.log
87432
sallymacfarlane@dev-elc-bu-tk-holesky-sally-6609-90:~$ grep "Failed to connect to peer" /var/log/besu/besu.log | grep -c "Timed out"
28319
sallymacfarlane@dev-elc-bu-tk-holesky-sally-6609-90:~$ grep "Failed to connect to peer" /var/log/besu/besu.log | grep -c "SocketExc"
57689
sallymacfarlane@dev-elc-bu-tk-holesky-sally-6609-90:~$ grep "Failed to connect to peer" /var/log/besu/besu.log | grep -c "TOO_MANY_PEERS"
1644
sallymacfarlane@dev-elc-bu-tk-holesky-sally-6609-90:~$ grep "Failed to connect to peer" /var/log/besu/besu.log | grep -c "Connection refused"
11
sallymacfarlane@dev-elc-bu-tk-holesky-sally-6609-90:~$ grep -c "Failed to connect to peer" /var/log/besu/besu.log
87907
I've restarted besu on 6609-90 and it has connected to the 2 bootnodes and discovery looks a bit more varied (normal?)
{"@timestamp":"2024-04-16T22:17:23,738","level":"INFO","thread":"nioEventLoopGroup-3-3","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x10 SUBPROTOCOL_TRIGGERED_MISMATCHED_NETWORK - 0x21f5d4f11abc64cd... bor/v1.2.8/linux-amd64/go1.20.5 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T22:17:23,738","level":"INFO","thread":"nioEventLoopGroup-3-5","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x10 SUBPROTOCOL_TRIGGERED_MISMATCHED_NETWORK - 0x3cde5a3e8b257b48... bor/v1.2.8/linux-amd64/go1.20.5 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T22:17:23,741","level":"INFO","thread":"nioEventLoopGroup-3-8","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x10 SUBPROTOCOL_TRIGGERED_MISMATCHED_NETWORK - 0x49ce49cf45f5fe39... bor/v1.2.8/linux-amd64/go1.20.5 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T22:17:35,840","level":"INFO","thread":"nioEventLoopGroup-3-4","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x10 SUBPROTOCOL_TRIGGERED_MISMATCHED_NETWORK - 0x315e23e64a99f688... Geth/v3.0.0-pulse-stable-7975e02e/linux-amd64/go1.20.5 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T22:17:35,841","level":"INFO","thread":"nioEventLoopGroup-3-3","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x10 SUBPROTOCOL_TRIGGERED_MISMATCHED_NETWORK - 0x7ab83377a659202d... Geth/v3.0.1-pulse-stable-db2c45dc/linux-amd64/go1.20.8 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T22:17:36,063","level":"INFO","thread":"nioEventLoopGroup-3-2","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x10 SUBPROTOCOL_TRIGGERED_MISMATCHED_NETWORK - 0x00105127dbf79041... Geth/v3.0.0-pulse-stable-7975e02e/linux-amd64/go1.20.5 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T22:17:36,116","level":"INFO","thread":"nioEventLoopGroup-3-5","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x10 SUBPROTOCOL_TRIGGERED_MISMATCHED_NETWORK - 0x67dd6cc1cfdd993b... Geth/v3.0.0-pulse-stable-7975e02e/linux-amd64/go1.20.5 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T22:17:36,118","level":"INFO","thread":"nioEventLoopGroup-3-6","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x10 SUBPROTOCOL_TRIGGERED_MISMATCHED_NETWORK - 0xdc33523c001de018... Geth/v3.0.1-pulse-stable-db2c45dc/linux-amd64/go1.20.8 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T22:20:28,923","level":"INFO","thread":"nioEventLoopGroup-3-1","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0xfd0f8208b189e548... bor/v1.2.7/linux-amd64/go1.20.14 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T22:20:43,746","level":"INFO","thread":"nioEventLoopGroup-3-8","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x10 SUBPROTOCOL_TRIGGERED_MISMATCHED_NETWORK - 0x92f073faa1b7c747... OpenEthereum/v3.3.3-stable/x86_64-linux-musl/rustc1.47.0 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T22:21:19,800","level":"INFO","thread":"nioEventLoopGroup-3-4","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x10 SUBPROTOCOL_TRIGGERED_MISMATCHED_NETWORK - 0x99502e898abbd65b... bor/v1.2.8/linux-amd64/go1.20.5 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T22:21:19,804","level":"INFO","thread":"nioEventLoopGroup-3-2","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x10 SUBPROTOCOL_TRIGGERED_MISMATCHED_NETWORK - 0x5c2ad585a7eb2244... bor/v1.2.8/linux-amd64/go1.20.5 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T22:21:25,107","level":"INFO","thread":"nioEventLoopGroup-3-1","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x10 SUBPROTOCOL_TRIGGERED_MISMATCHED_NETWORK - 0xf783d7c28cd1b5dd... bor/v1.2.8/linux-amd64/go1.20.5 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T22:21:30,578","level":"INFO","thread":"nioEventLoopGroup-3-1","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x10 SUBPROTOCOL_TRIGGERED_MISMATCHED_NETWORK - 0x44c394b274b0581c... bor/v1.2.8/linux-amd64/go1.20.5 - 2 peers left","throwable":""}
also noticed on 6609-00 and 6609-02, where I forcibly added the 2 bootnodes as peers, those nodes haven't added any other peers via discovery (over last 9hours) - whereas other holesky nodes started at the same time are now up to 25 peers (note 6609-90 just restarted, has 2 peers, still syncing)
with no further intervention, 6609-90 is now up to 25 peers
I would also suggest changing the function signature on:
boolean shouldConnect(Peer peer, final boolean incoming);
For the ShouldConnectCallback
and ProtocolManager
interfaces.
The result of this is that any false
returned here becomes an UNKNOWN disconnect reason.
Which is typically 80-100k per day once the max peers is saturated. For me on Holesky.
The disconnect reason would be rather interesting because:
- That makes these metrics more useful.
- We communicate that to our peer, making us a better citizen on the network.
So perhaps it should be a DisconnectReason
or null
return type. Or a dedicated type for this.
agree @Beanow - I'm actually working on a related fix right now that will help a lot to get rid of UNKOWN disconnects
@Beanow see #6968 which gets rid of majority of UNKNOWN disconnects. can then do a follow on refactor that changes the method signature as you suggest
@macfarla I also wonder out loud about 2 things:
- Would there be a substantial difference in quality of peers we receive through DNS vs Peer discovery?
- If there's research/literature on good heuristics for maintaining peers.
#6968 mainly refers to the PeerReputation, though I'm not well versed in the matter enough to know if - in aggregate - this could lead to cliques or local maximum kinds of issues.
useful command to collect data on disconnect reasons by client name
cat /var/log/besu/besu.log | grep -v MISMATCHED_NET | grep "bound - 0x" | awk '{print $16,",",$19,$22}' | sort
example output from running local mainnet node (latest main) with teku for ~45 min https://docs.google.com/spreadsheets/d/1ZqsoD-wIlxZjOW1RokS8Op50ZxjuyPYkuF4Yp3RDB4g/edit?usp=sharing
calling this done for now