Introduce --rotate-external-peers flag to improve bootstrap client peering
Motivation
Currently, the bootstrap client connections fill up too quickly, making it cumbersome for new clients/provers to join the network. By letting bootstrap clients flush their peers more frequently, new clients have an opportunity to connect.
Naming nits are welcome, we can also go for:
--flush-external-peers--evict-external-peers--bootstrap-client
A complementary change would be to increase the number of bootstrap peers.
Why not just increase max peers with a --max-peers flag?
- It does not fully solve the underlying problem, a bootstrap node with more peers can fill up just as well
- It should be hard to mess up configuration for average joe. Having a larger number of peers has not been sufficiently tested and will likely lead to performance degredation, so if possible the option should not be available.
- Also note that unfortunately doing this cleanly requires a lot of code changes because the existing
MAXIMUM_NUMBER_OF_PEERSvariable is a compile-time variable and tighly coupled with other logic. Probably because @ljedrz mentions "~20 peers in a P2P setting being a desirable number is a popular notion".
Test Plan
After approval this should be deployed and tested.
Related PRs
Aims to tackle:
- https://github.com/AleoNet/snarkOS/issues/2960
- https://github.com/AleoNet/snarkOS/issues/3011
- https://github.com/AleoNet/snarkOS/issues/3284
why not have a config file/cli commands with: an allowlist and denylist
I can see that being useful, though out of scope for this PR.
I'm in favor of peer rotation by bootstrappers, but we need to be cautious, as the heartbeat interval is currently only 25s, which may not provide enough time for the new peers to learn about others. I don't think we currently provide a list of peers as one of the initial post-handshake (OnConnect) actions, but we may want to do so now, at least for those that enable peer rotation.
Indeed, that's an issue. Letting the bootstrap peer send a PeerResponse uninvited will not be accepted by the counterparty, but we can let the counterparty send a PeerRequest: efd5ef0d2
The 4 existing bootstrap peers may relay 4*MEDIAN_NUMBER_OF_PEERS - 4 = 36 other clients, of which some hopefully are able to welcome the new client. If that's experimentally shown not to be sufficient, more (light-weight) bootstrap peers can be added to the bootstrap peer list.
Additionally, the client could choose to rotate not every heartbeat, but once every 10 heartbeats using if rand(..10) < 1, or if ledger.height % 10 == 0. But I feel its overcomplicating. What do you think?
Additionally, the client could choose to rotate not every heartbeat, but once every 10 heartbeats
I've considered this too; this certainly makes a lot of sense (assuming a 25s window IIRC) for peer shuffling, but this may also be useful for some other heartbeat activities that don't need to happen as often.
I've considered this too; this certainly makes a lot of sense (assuming a 25s window IIRC) for peer shuffling, but this may also be useful for some other heartbeat activities that don't need to happen as often.
2f7f799d6 - CI result
One consideration is that this may affect syncing in a non-trivial way. More frequent disconnect + connections will likely slow down syncing.
One possible approach is if we can prevent the client from rotating clients if it is currently syncing. Though, this does not protect from the other side where the the node sending blocks.
One consideration is that this may affect syncing in a non-trivial way. More frequent disconnect + connections will likely slow down syncing. One possible approach is if we can prevent the client from rotating clients if it is currently syncing. Though, this does not protect from the other side where the the node sending blocks.
Good callout. Given that clients won't voluntarily disconnect from bootstrap peers, I don't think the "node sending blocks disconnecting" is a concern. But we should ensure bootstrap nodes (and even any client) does not disconnect from other clients they're syncing from. 7eac67f1c
@zosorock can you confirm if this has still been running succesfully on a bootstrap node? If so I think we can mark this ready to merge
@vicsn
Yes, it has been running on all 3 networks without any noticeable issues. I am of the opinion that one should not be syncing with a bootstrap node, especially given lots of people abuse them and connect dozens of times nonstop. I say rotate them often and make syncing with bootstrap nodes a wasteful endeavor.
Amazing. @zosorock noticed one more small improvement which should increase the frequency of disconnects: 9dccba419