snarkOS icon indicating copy to clipboard operation
snarkOS copied to clipboard

Introduce --rotate-external-peers flag to improve bootstrap client peering

Open vicsn opened this issue 1 year ago • 5 comments

Motivation

Currently, the bootstrap client connections fill up too quickly, making it cumbersome for new clients/provers to join the network. By letting bootstrap clients flush their peers more frequently, new clients have an opportunity to connect.

Naming nits are welcome, we can also go for:

  • --flush-external-peers
  • --evict-external-peers
  • --bootstrap-client

A complementary change would be to increase the number of bootstrap peers.

Why not just increase max peers with a --max-peers flag?

  1. It does not fully solve the underlying problem, a bootstrap node with more peers can fill up just as well
  2. It should be hard to mess up configuration for average joe. Having a larger number of peers has not been sufficiently tested and will likely lead to performance degredation, so if possible the option should not be available.
  3. Also note that unfortunately doing this cleanly requires a lot of code changes because the existing MAXIMUM_NUMBER_OF_PEERS variable is a compile-time variable and tighly coupled with other logic. Probably because @ljedrz mentions "~20 peers in a P2P setting being a desirable number is a popular notion".

Test Plan

After approval this should be deployed and tested.

Related PRs

Aims to tackle:

  • https://github.com/AleoNet/snarkOS/issues/2960
  • https://github.com/AleoNet/snarkOS/issues/3011
  • https://github.com/AleoNet/snarkOS/issues/3284

vicsn avatar Jun 10 '24 11:06 vicsn

why not have a config file/cli commands with: an allowlist and denylist

I can see that being useful, though out of scope for this PR.

I'm in favor of peer rotation by bootstrappers, but we need to be cautious, as the heartbeat interval is currently only 25s, which may not provide enough time for the new peers to learn about others. I don't think we currently provide a list of peers as one of the initial post-handshake (OnConnect) actions, but we may want to do so now, at least for those that enable peer rotation.

Indeed, that's an issue. Letting the bootstrap peer send a PeerResponse uninvited will not be accepted by the counterparty, but we can let the counterparty send a PeerRequest: efd5ef0d2

The 4 existing bootstrap peers may relay 4*MEDIAN_NUMBER_OF_PEERS - 4 = 36 other clients, of which some hopefully are able to welcome the new client. If that's experimentally shown not to be sufficient, more (light-weight) bootstrap peers can be added to the bootstrap peer list.

Additionally, the client could choose to rotate not every heartbeat, but once every 10 heartbeats using if rand(..10) < 1, or if ledger.height % 10 == 0. But I feel its overcomplicating. What do you think?

vicsn avatar Jun 13 '24 14:06 vicsn

Additionally, the client could choose to rotate not every heartbeat, but once every 10 heartbeats

I've considered this too; this certainly makes a lot of sense (assuming a 25s window IIRC) for peer shuffling, but this may also be useful for some other heartbeat activities that don't need to happen as often.

ljedrz avatar Jun 13 '24 14:06 ljedrz

I've considered this too; this certainly makes a lot of sense (assuming a 25s window IIRC) for peer shuffling, but this may also be useful for some other heartbeat activities that don't need to happen as often.

2f7f799d6 - CI result

vicsn avatar Jun 18 '24 19:06 vicsn

One consideration is that this may affect syncing in a non-trivial way. More frequent disconnect + connections will likely slow down syncing.

One possible approach is if we can prevent the client from rotating clients if it is currently syncing. Though, this does not protect from the other side where the the node sending blocks.

raychu86 avatar Jul 03 '24 00:07 raychu86

One consideration is that this may affect syncing in a non-trivial way. More frequent disconnect + connections will likely slow down syncing. One possible approach is if we can prevent the client from rotating clients if it is currently syncing. Though, this does not protect from the other side where the the node sending blocks.

Good callout. Given that clients won't voluntarily disconnect from bootstrap peers, I don't think the "node sending blocks disconnecting" is a concern. But we should ensure bootstrap nodes (and even any client) does not disconnect from other clients they're syncing from. 7eac67f1c

vicsn avatar Jul 04 '24 03:07 vicsn

@zosorock can you confirm if this has still been running succesfully on a bootstrap node? If so I think we can mark this ready to merge

vicsn avatar Sep 25 '24 13:09 vicsn

@vicsn

Yes, it has been running on all 3 networks without any noticeable issues. I am of the opinion that one should not be syncing with a bootstrap node, especially given lots of people abuse them and connect dozens of times nonstop. I say rotate them often and make syncing with bootstrap nodes a wasteful endeavor.

zosorock avatar Oct 03 '24 16:10 zosorock

Amazing. @zosorock noticed one more small improvement which should increase the frequency of disconnects: 9dccba419

vicsn avatar Oct 03 '24 17:10 vicsn