ouroboros-network icon indicating copy to clipboard operation
ouroboros-network copied to clipboard

Detailed Gossip Implementation Plan

Open bolt12 opened this issue 2 years ago • 6 comments

Create a Gossip planning document and put it in this repository's wiki. Further discussion about said document should be done in this issue.

Wiki page link: https://github.com/input-output-hk/ouroboros-network/wiki/Gossip-Planning

Discussion shall carry on in the respective Broken down issues:

bolt12 avatar Aug 05 '22 09:08 bolt12

How to select peers to ask from?

Peter had an argument that everybody needs to connect to ledger peers. I think this relies on the assumption that to provide a good relay, one needs to connect to SPO relays. It could be that one can provide a good service (as a relay node) and still be connected to only non-spo relays. But I'd argue that this is not necessarily important for current deployment when there are no non-SPO relays at all. Hence we could just start peer sharing requests only with established upstream peers.

How to identity peers to share?

Anyway, the only way to identify is by their (IP, PORT) (which is not good identity).

Should we verify they are/were contactable/online?

The way to verify which is on the table is to do handshake and terminate the connection. This requires adding a flag to handshake NodeToNodeVersionData : we need to make sure that the server side knows that after negotiating the connection, the connection will be closed (no need to commit more resources, e.g. start mux, start mini-protocols, etc).

Should we know about the peer Server’s hard limit?

I think this could be a security hazard. There's no way for us to verify that information, and it would open an easy way for an attacker to pretend that he has a very high hard limit (or number of available inbound connections), and thus attract one to become its upstream peer.

There's one more handshake flag that we could negotiate: whether a peer wishes to take part in gossip, but I don't think this should be voluntary.

How to decide which peers to share?

Should they be picked at random?

Ledger peers with more stake, will know more peers. We establish connections with ledger peers based on stake distribution, which solves this for us. The choice of which established ledger peers to ask to should be random, this will preserve the stake distribution.

But I think we should also gossip with non-ledger peers. We should decouple policy from the mechanism. The policy would decide from the available peers which ones we should choose for share requests. The policy should know which peers are ledger peers: this would allow us to have a policy in which at least 20% asked peers are ledger peers or at least 1 shared peer is a ledger peer, etc.

Should we let others know about adversarial nodes too?

There's no way that this information can be validated, and thus we should not pass this information to others.

When to or not to Peer Share?

Should we have targets for Peer Share Peers?

Yes, but we don't need to make it explicit. We have a ledger peer target and everything above it should be filled with either ledger peers or shared peers.

In what context does it make sense to perform Peer Sharing (i.e. while bootstrapping, syncing, caught up, all the time)?

That's too be decided. If in the bulk-sync mode we decide to have a more or less constant number of peers, we will want to disable promoting or using gossip peers.

Should any type of node not participate in Peer Sharing (BP, Relay, Wallet, etc..)?

Yes, BPs nodes should not peer share at all, hence we should be able to turn Peer Sharing on / off in the configuration file. For relay nodes and wallets it usually will make sense to gossip. For wallet nodes it might make sense to announce themselves as peers which should not be shared. We actually already have a flag which can be used for that purpose: DiffusionMode. There is no reason to share a peer which runs in the InitiatorOnlyDiffusionMode. However, this option should be controlled by a Daedalus user.

Should we churn shared peers?

Yes, the churn mechanism should demote peers should not distinguish non-ledger and ledger peers. I think we won't need to change churn when implementing Peer Sharing.

Should we have a target for hot Peer Sharing peers?

You mean for non-ledger hot peers. This is implicit: the difference between ledger peers and active peers. But we might want to make the outbound-governor more aggressive on replacing ledger peers with non-ledger peers if they are available (and the ledger peer target is met).

coot avatar Aug 05 '22 12:08 coot

Anyway, the only way to identify is by their (IP, PORT) (which is not good identity).

Right, maybe "identify" is the wrong word, what I meant was more "How to decide which peers to share"

The way to verify which is on the table is to do handshake and terminate the connection. This requires adding a flag to handshake NodeToNodeVersionData : we need to make sure that the server side knows that after negotiating the connection, the connection will be closed (no need to commit more resources, e.g. start mux, start mini-protocols, etc).

This begs a different question that is, when will we perform such handshake? For inbound connections we can already verify if the node is up for Peer Sharing but peers from other sources which have not hand-shaken with us, when is a good time to do handshake with them?

I think this could be a security hazard. There's no way for us to verify that information, and it would open an easy way for an attacker to pretend that he has a very high hard limit (or number of available inbound connections), and thus attract one to become its upstream peer.

There's one more handshake flag that we could negotiate: whether a peer wishes to take part in Peer Sharing, but I don't think this should be voluntary.

Yes I agree, but if this is the case should we need to workaround the current rate limiting in the Server, right? For example if a node already met its hard limit, it will refuse any attempt to perform handshake for Peer Sharing purposes, or is this not a problem?

Ledger peers with more stake, will know more to-share peers. We establish connections with ledger peers based on stake distribution, which solves this for us. The choice of which established ledger peers to ask should be random, this will preserve the stake distribution.

But I think we should also peer share with non-ledger peers. We should decouple policy from the mechanism. The policy would decide from the available peers which ones we should choose for share requests. The policy should know which peers are ledger peers: this would allow us to have a policy in which at least 20% asked peers are ledger peers or at least 1 asked peer is a ledger peer, etc.

I am not sure why you mention that upstream peers are based on stake distribution, I don't think this is currently true, is it?

I think what's decided is to gossip with upstream peers, if that's the case, in the beginning most upstream peers will be ledger (or IOHK relays) nodes, but as soon as sharing results start showing our upstream peers will be most non-ledger, so we'll have to make share requests with non-ledger at some point, since they will be the majority in upstream peers. I don't think such a policy as you describe is needed since this will already happen as a consequence of Peer Sharing.

Having said this I think this needs further clarification: Should we only share non-ledger peers? Should we have a target for hot ledger peers?

Yes, but we don't need to make it explicit. We have a ledger peer target and everything above it should be filled with either ledger peers or shared peers.

What do you mean with the second sentence? We have a ledger peer target, if that target is met then why should we fill it? Or are you saying this target can now mean ledger+shared peers?

You mean for non-ledger hot peers. This is implicit: the difference between ledger peers and active peers. But we might want to make the outbound-governor more aggressive on replacing ledger peers with non-ledger peers if they are available (and the ledger peer target is met).

Hmmm, yes and no, it depends on somethings I am not still sure about, such as: Should we only gossip about non-ledger peers? If that's the case and we want to really prioritize non-ledger over ledger peers as upstream, maybe a hot ledger peer target makes sense (unless Eclipse evasion design has this into consideration); also notice that if that's the case making the outbound-governor more agressive towards replacing ledger peers with non-ledger peers might not be needed since in pair with churning it might happen that non-ledger peers will be more likely to be more abundant, hence increasing the chance of them being upstream

bolt12 avatar Aug 05 '22 12:08 bolt12

This begs a different question that is, when will we perform such handshake?

I wouldn't do that in the same thread as gossip mini-protocol, but rather push the peers to a queue, and have a validator which pulls them on the other side, runs the handshake (with a reasonable timeout), and makes it available to the outbound-governor. We could also consider validating peers in bulk (5 at a time) - but one should think if this is necessary.

Yes I agree, but if this is the case should we need to workaround the current rate limiting in the Server, right? For example if a node already met its hard limit, it will refuse any attempt to perform handshake for gossip purposes, or is this not a problem?

That's actually good, the system will not direct more connections towards that overloaded host.

I am not sure why you mention that upstream peers are based on stake distribution, I don't think this is currently true, is it?

Upstream (so the ones which we connect to to receive information from them) ledger peers are drawn base on stake distribution. It is a bit buried in the root peers api: when there are too few root peers, outbound governor will ask for more. They will either come from root peers in the network topology file or the ledger, the ledger ones will be drawn according to the stake. Check here.

I don't think such a policy as you describe is needed since this will already happen as a consequence of Peer Sharing.

We will always have at least targetNumberRootPeers of ledger peers (well, some of them might be root peers but not ledger, but that's ok), and these will be established.

Should we only share non-ledger peers? There's no reason to pass information about ledger peers trhough peer sharing, everybody peer (which is sufficiently synced), knows about them.

Should we have a target for hot ledger peers? What do you mean with the second sentence? We have a ledger peer target, if that target is met then why should we fill it? Or are you saying this target can now mean ledger+shared peers?

We have targetNumberRootPeers which guarantees that at least that many peers are either ledger peers or root peers from the topology file. This target is a bit different from all the other targets, it's one sided (i.e. we'll have at least that many peers, could be more). If that target is 5 and the target of active peers is 20 then at least 5 peers will be root peers, the rest (15) will either be a shared peer or a root peer. I think this mechanism is good enough. However, what we might want to do is to make sure that if we can chose a shared peer then we prefer a shared peer over a ledger peer.

coot avatar Aug 05 '22 13:08 coot

Upstream (so the ones which we connect to to receive information from them) ledger peers are drawn base on stake distribution. It is a bit buried in the root peers api: when there are too few root peers, outbound governor will ask for more. They will either come from root peers in the network topology file or the ledger, the ledger ones will be drawn according to the stake. Check here.

Got it! But I understand upstream peers as hot peers, or do warm peers also count?

bolt12 avatar Aug 05 '22 14:08 bolt12

This has become a separate issue - #3956

Peter had an argument that everybody needs to connect to ledger peers. I think this relies on the assumption that to provide a good relay, one needs to connect to SPO relays. It could be that one can provide a good service (as a relay node) and still be connected to only non-spo relays. But I'd argue that this is not necessarily important for current deployment when there are no non-SPO relays at all. Hence we could just start with peer sharing only with established upstream peers.

I believe that the issue here is more around having at least one peer that is on the honest chain. We now have an idea of how many that should be during the Genesis phase, but in Tip-following phase I think that we are still going to need to need to have a non-zero (probably 2) connections to ledger peers. So, I think, Peter's assertion was not about performance but part of the eclipse-avoidance issue.

The Peer Sharing process will have to start from the peers to which we are (initially) connected - that will be either the bootstrap peers (which can be seen as a pre-loading of the ledger peer cache) or the off-chain ledger peers themselves.

Would it be worth pulling this out into a separate issue? Call it something like "Outline of Peer Sharing bootstrap" in it we can expand on the expect evolution and discuss the likely dynamics etc

njd42 avatar Aug 08 '22 09:08 njd42

Updated the wiki document with some of the things highlighted in this discussion so far

bolt12 avatar Aug 16 '22 16:08 bolt12