zebra
zebra copied to clipboard
The number of outbound connections reported by progress bars periodically glitches
Description
When syncing on Testnet, I noticed the number of outbound connections reported by the new progress bars periodically jumps from ~7 to the limit of 75 and then back to ~7. The period seems to be equal to the crawl_new_peer_interval
setting.
Analysis
That's probably because the connection counter actually comes from the connection limit, which is updated when we start making a connection. (Not when we successfully connect.)
So every few minutes, we re-try all the failed peers at the same time, the connection limit goes to 75, and then it drops back when they all fail.
Possible Solutions
Reducing the progress bar update interval would hide this, but it would still occasionally happen. (And be visible for longer.)
The real fix would be to move the progress bar to the peer set. We track inbound/outbound on the ConnectedAddr type in ConnectionInfo, which is part of the LoadTrackedClient in the PeerSet.
We do have this info in the peer set, I updated the ticket.
Is this issue still relevant?
Yes, it is.
Okay, I'd like to work on this then
Great, feel free to open a PR.
Hey, I have some questions. As far as I understand, the ActiveConnectionCounter
in the initialize.rs
for the outbound connections should remain since it is required to stop new connection attempts when the limit is reached. So is it only the howudoin progress bar I should move in the PeerSet
? Secondly, getting the number of outbound connections inside the PeerSet
using the ConnectedAddr
would be possible with something like:
let outbound_connections = self.ready_services.values()
.filter(|svc| matches!(svc.connection_info.connected_addr, ConnectedAddr::OutboundDirect { .. }))
.count();
but the problem is that unready_services is FuturesUnordered
and I can't check whether the services inside are inbound or outbound. I have thought about incrementing/decrementing the counter at all places where services are added and dropped but that is also not possible for the same reason that checking the unready services is not possible. Do you have any suggestions? Should I maybe use another ActiveConnectionCounter
for this?