celestia-core
celestia-core copied to clipboard
Other seed nodes aren't dialed if a seed node is dialed successfully but errors later
Bug Report
Setup
Using celestia-appd v3.4.2.
What happened?
A seed is chosen randomly when starting celestia-appd. If Pops' seed is chosen and Error 1 is hit, then the other seeds are dialed. However if Error 2 is hit, Pops' seed is dialed again, rather than the other seeds. Depending on conditions this can end up just stalling forever.
This might be because in Error 2 isn't in dialing the seed, it's later. Note the difference between
2:11PM ERR Error dialing seed err
and
2:11PM INF Connection is closed @ recvRoutine (likely by the other side)
The error in Error 2 isn't from dialing, but rather later.
2:11PM ERR Stopping peer for error err=EOF module=p2p peer={"Data":{},"Logger":{}}
An additional is that when not getting an error on dial, other peers are added from the seed node. But those other peers also produce the same no-error-on-dial-but-error-later behavior.
What did you expect to happen?
If a seed disconnects, other seeds should be tried.
How to reproduce it
Run celestia-app start for network celestia with seeds. May have to try a few times until the node randomly chooses the correct (or incorrect, depending on your pov) seed.
Logs
Error 1:
2:11PM INF No addresses to dial. Falling back to seeds module=pex
2:11PM ERR Error dialing seed err="auth failure: secret conn failed: read tcp 192.168.2.43:40832->135.181.246.172:26656: i/o timeout" module=p2p seed={"id":"acca7837e4eb5f9dc7f5a94ed1d82edda6931ff8","ip":"135.181.246.172","port":26656}
2:11PM INF service start impl="Peer{MConn{51.159.204.190:26656} 19b3ef846732c465cc32da3eaddb9c0fb41f57d2 out}" module=p2p msg={} peer={"id":"19b3ef846732c465cc32da3eaddb9c0fb41f57d2","ip":"51.159.204.190","port":26656}
2:11PM INF service start impl=MConn{51.159.204.190:26656} module=p2p msg={} peer={"id":"19b3ef846732c465cc32da3eaddb9c0fb41f57d2","ip":"51.159.204.190","port":26656}
2:11PM INF executed block height=1 module=state num_invalid_txs=0 num_valid_txs=0
2:11PM INF commit synced commit=436F6D6D697449447B5B323434203630203133392031333220353220313420313034203320313732203133342031373320313437203735203320323331203230203934203231203935203132362031343220343620313236203235203635203435203131302032313620313634203736203139312032305D3A317D
Error 2:
2:11PM INF No addresses to dial. Falling back to seeds module=pex
2:11PM INF service start impl="Peer{MConn{135.181.246.172:26656} acca7837e4eb5f9dc7f5a94ed1d82edda6931ff8 out}" module=p2p msg={} peer={"id":"acca7837e4eb5f9dc7f5a94ed1d82edda6931ff8","ip":"135.181.246.172","port":26656}
2:11PM INF service start impl=MConn{135.181.246.172:26656} module=p2p msg={} peer={"id":"acca7837e4eb5f9dc7f5a94ed1d82edda6931ff8","ip":"135.181.246.172","port":26656}
2:11PM INF Connection is closed @ recvRoutine (likely by the other side) conn={"Logger":{}} module=p2p peer={"id":"acca7837e4eb5f9dc7f5a94ed1d82edda6931ff8","ip":"135.181.246.172","port":26656}
2:11PM INF service stop impl={"Logger":{}} module=p2p msg={} peer={"id":"acca7837e4eb5f9dc7f5a94ed1d82edda6931ff8","ip":"135.181.246.172","port":26656}
2:11PM ERR Stopping peer for error err=EOF module=p2p peer={"Data":{},"Logger":{}}
2:11PM INF service stop impl={"Data":{},"Logger":{}} module=p2p msg={} peer={"id":"acca7837e4eb5f9dc7f5a94ed1d82edda6931ff8","ip":"135.181.246.172","port":26656}