consensus-specs icon indicating copy to clipboard operation
consensus-specs copied to clipboard

Proposed attnets revamp

Open djrtwo opened this issue 4 years ago • 37 comments

Attnets revamp

Since the launch of the beacon chain, the "backbone" of attestation subnets (attnets) relies upon staking nodes connecting to a random set of subnets. The size of this random set is dictated by the quantity of validators attached to the node, up to the maximum of 64 (ATTESTATION_SUBNET_COUNT which maps to the expected SHARD_COUNT). The general idea at genesis was that a node's validator requirments will scale linearly when sharding is released thus we can put this linear subnet requirement for subnet backbones as an "honesty" assumption until sharding comes around.

An attestation subnet backbone is required so that at each epoch, validators can quickly find and publish to their assigned subnet. If there was no notion of persistence in these subnets, then there would be no subnet to "find" in he ~6 minute window and those no clear place to publish individual attestations before they are aggregated.

Backbone requirements:

  • Subnets are relatively stable, slot-to-slot and epoch-to-epoch, allowing for reliable dissemination of messages on demand
  • Subnets entry points can be found in a relatively short time (< 1 minute) and short search overhead (within a few hops of the DHT)
  • (Nice to have) A method to discern whether a node is "honestly" performing the expected backbone duty

Problems

There are a few issues with the current structure:

  1. This likely creates overly populated subnets in actuality, thus increasing the network's total bandwidth consumption with little to no gain
  2. This relies on an unenforcable "honesty" of validator nodes when the rational behavior is to turn your attnets down to 1 or even 0.
  3. As non-staking (user nodes) come online more and more, such (0-attnet) nodes will crowd the DHT, making the task of finding peers of particular subnets increasingly difficult. In the event that user nodes outpace staking nodes by 10-to-1 (a situation that should be applauded!), finding attestation backbones would become 10x more difficult.

Proposed solution

In an effort to solve the above issues, we propose:

  • Remove random subnets per validator
  • Add a single deterministic subnet per node, as a function of by the node_id, epoch

Rather than putting the backbone requirement on a brittle validator-honesty assumption, this puts the backbone requirement on the entire network of full nodes such that, on average one out of every ATTESTATION_SUBNET_COUNT nodes will be of a particular subnet.

This means that the size of subnets becomes a function of the total number of nodes on the network, rather than on staking node count combined with validator-node density. This means that we simultaneously reduce the over population of attnets by a small number of staking nodes and ensure that even if the Ethereum network (and DHT) grows by orders of magnitude, attnets will be able to be found within a few hops.

Additionally, due to the requisite subnet subscription being a function of a node's peer-id, honest/dishonesty wrt attnet backbone can be deterministically assessed allowing for peer downscoring and disconnects of dishonest peers.

The downside to this approach is that it puts a minimum of one attnet of load on every node rather than just on staking nodes, but in estimation, this is not a very high burden wrt home node resources and provides much more benefit in meeting the backbone requirements than negative.

Concrete spec mods

Remove random subscriptions

Add node-id subscription

  • create function compute_subnet_from_node_id(node_id) -> uint64 that takes in a node_id and returns a value on [0, ATTESTATION_SUBNET_COUNT). Consider an epoch param that causes these subscriptions to slowly rotate on the order of ~1 day
  • Add "MAY downscore peers that do no actively subscribe/participate in their currently assigned subnet based on compute_subnet_from_node_id
  • In Lookahead section, replace attnets dht search with compute_subnet_from_node_id dht search

Strategy

We'd likely want to simulate and test this change in strategy in a controlled environment before pushing this to testnets and then mainnet.

Such a controlled environmnt to test gossipsub at scale seems critical to a number of the network optimization investigations underway.

Protocol Lab's testground could be a good candidate. Alternatively, another simulation framework or even spinning up 1k+ distributed networks for ~1 day tests could also be a viable path.

EDITED TO USE node_id instead of peer_id per discussions below

djrtwo avatar Nov 29 '21 20:11 djrtwo

This solution is really neat, having an enforceable condition is a great improvement. However, must be noted that for small stakers with a single key this requirement doubles their attestation processing cost while for large stakers / pools it'll represent a tiny increase, going against

The general idea at genesis was that a node's validator requirments will scale linearly when sharding is released

Post sharding, single key stakers will be expected two fully validate two shards if their random subnet != current duty subnet?

dapplion avatar Nov 30 '21 07:11 dapplion

Add a single random subnet per node, dictated by the peer-id

@djrtwo Should the 'random' word be removed here? As I understand subnet is defined by deterministic function of (peerId, epoch) ?

Nashatyrev avatar Nov 30 '21 13:11 Nashatyrev

Good proposal from my perspective 👍

It also has an extra benefit: it would make validator nodes not so trivially discoverable (currently it is even easy to estimate the number of validators per node with discovery attSubnets and syncSubnets attributes)

However we need to consider a potential attack on a single attestation subnet. An attacker could craft specific nodeIds and spin up malicious nodes to eclipse a specific subnets for some specific range of epochs. An attack on a single subnet would require just 1/64 of malicious nodes than attacking the whole network.

It might make sense to think of mixing a randao to the compute_subnet function. That will significantly limit an attacker time to build it's 'subnetwork' of nodes in terms of discovery and gaining reputation. This would prevent from building malicious nodes for attack at some distant epoch. That's just a spontaneous idea.

Nashatyrev avatar Nov 30 '21 15:11 Nashatyrev

Should the 'random' word be removed here? As I understand subnet is defined by deterministic function of (peerId, epoch) ?

yeah, I guess I mean that you get an effective pseudo-random distribution. Will update

djrtwo avatar Nov 30 '21 15:11 djrtwo

it would make validator nodes not so trivially discoverable

right! an incredible amount of information is leaked today.

An attacker could craft specific nodeIds and spin up malicious nodes to eclipse a specific subnets for some specific range of epochs

I see. I suppose you can attempt to flood the neighbor DHTs with peer-ids of a particular type so that when a node searches for a subnet, they find these "fake" subnet peers.

I would argue that the damage of eclipsing a small set of nodes wrt an attsetation subnet is likely limited -- a few validators might no get their individual attestations included. A targeted DOS could do the same thing with likely even fewer resources.

Additionally, gossipsub's IWANT/IHAVE and other mechanisms are primarily for these types of cases. To aid dissemination of data even under various attacks to subnets.

An attack on a single subnet would require just 1/64 of malicious nodes than attacking the whole network

Right, but this is likely better in the long run over our status quo. In the event that you have a 10:1 (or even 100:1) user_node:staking_node ratio then attacking subnets as a function of total network nodes degrades more and more.

It might make sense to think of mixing a randao to the compute_subnet function

Ah, okay. So you can't pre-compute a node-id table that will be useful for all time. This seems reasonable if you had short subscription times, but I think you likely want subscription times on the order of at a bare minimum 1-hour (or at least multiple epochs) to reduce subnet backbone churn. Once you get to 1-hour or 1-day an attack would be able to easily "mine" node-ids with a given randao. I suppose it makes it harder to get your peers positioned in the DHT though

djrtwo avatar Nov 30 '21 15:11 djrtwo

Right, but this is likely better in the long run over our status quo.

Yep, the present solution looks even more vulnerable to me.

Ah, okay. So you can't pre-compute a node-id table that will be useful for all time. This seems reasonable if you had short subscription times, but I think you likely want subscription times on the order of at a bare minimum 1-hour (or at least multiple epochs) to reduce subnet backbone churn. Once you get to 1-hour or 1-day an attack would be able to easily "mine" node-ids with a given randao. I suppose it makes it harder to get your peers positioned in the DHT though

Yes, that's totally not a silver bullet, but limits an attacker time to 1-hour/1-day. Additionally nodes wishing to join a subnet could prioritize nodes which were seen in discovery longer than 1-hour/1-day since they are 100% could not be specially crafted for that kind of attack.

Nashatyrev avatar Nov 30 '21 16:11 Nashatyrev

Additional argument pro this solution:

I'm now gathering mainnet network statistics and it looks like large validator nodes (with many att subnets subscriptions) tend to restrict inbound connections: i.e. they either immediately drop connection or keep you connected for just several seconds and then disconnect with TOO_MANY_PEERS. (I think I could get back soon with some specific statistics)

That behavior seems reasonable when you have much at stake since every inbound connection adds more risk to be attacked. So I believe this tendency could evolve further with time and it would be even harder to join a subnet

UPD: some connection statistics:

Number of nodes tried to connect (>= 20 connection attempts): 2769
Total connection attempts: 812291

Nodes with subnet count [1..31]:
    Connected at least once: 1818
    Never connected: 451
Nodes with subnet count [32..64]:
    Connected at least once: 330
    Never connected: 170

Probability of being connected to a node (connected here means "negotiate on Eth2 level and then being connected for at least 5 slots") with grouping to 'small' validator nodes (from 1 to 31 att subnets) and 'large' validator nodes (more than 31 att subnets):

изображение

So large validator nodes indeed harder to connect to, but the overall statistics doesn't look too dramatic

Nashatyrev avatar Nov 30 '21 16:11 Nashatyrev

I would argue that due to our attestation subnet search requirements that all nodes should have both a MAX_PEERS and TARGET_PEERS value that by-default are different (MAX_PEERS > TARGET_PEERS). This would allow for new inbound connections to be made (up to MAX_PEERS) while the node then prunes down to TARGET_PEERS.

Otherwise, node connections can become far to rigid and peer discovery for particular purposes (e.g. finding a subnet) can become more prone to error.

Lighthouse uses this mechanism and has since genesis.

We might evn consider specifying this as honest behaviour (e.g. MAX_PEERS >= TARGET_PEERS + 5 for any value of TARGET_PEERS. It's unenforceable but a better default/honest behaviour, imo

djrtwo avatar Nov 30 '21 19:11 djrtwo

That behavior seems reasonable

I was probably trying to say 'behavior rational from validator node's perspective' rather than 'reasonable from the network point of view'

I agree that dropping inbound connections is not a honest behavior, but if I was maintaining a node with a lot of validators and acting rationally I would stick to some kind of conservative peer management policy like peering with just a limited known subset of nodes or something similar

Nashatyrev avatar Nov 30 '21 20:11 Nashatyrev

I would argue that due to our attestation subnet search requirements that all nodes should have both a MAX_PEERS and TARGET_PEERS value that by-default are different (MAX_PEERS > TARGET_PEERS). This would allow for new inbound connections to be made (up to MAX_PEERS) while the node then prunes down to TARGET_PEERS.

Teku does this in a similar way to Lighthouse but because there are so many nodes with closed ports, any node with open ports gets far more incoming connection attempts than it needs (or can reasonably be expected to accept). So even with a large difference between TARGET_PEERS and MAX_PEERS the node winds up constantly at MAX_PEERS. That effect is even more amplified for a node that is subscribed to a lot of subnets because it's then a high value peer to have.

We've been recommending to users for a while that the best way to improve attestation performance is ensure incoming connections work because that not only lets you find peers faster, it gives you a much wider selection of peers to select from because you will inevitably have more incoming requests than your node could possibly handle.

ajsutton avatar Nov 30 '21 23:11 ajsutton

Also in favour of the change. I'm just thinking about attacks mentioned and also discovery efficency.

Just to clarify, anyone can still spin up a bunch of nodes and subscribe to a single subnet and try and censor messages/cause havoc on a subnet. The change here (as I understand) is that we would actively search for specific peer-ids rather than ENRs with subnet fields. The attack as it stands today, would consist of just spinning up a bunch of nodes with enr's saying you are subscribed to a subnet and discovery would try and find those nodes. I think the change suggested is here is strictly safer in that now you can't just modify an ENR field, you have to pre-generate a bunch of node-ids.

I think the next interesting thing is about whether you can eclipse a target node via discovery with your malicious nodes (that you've pregenerated). We usually search for random node ids, which makes eclipsing a individual node hard provided there are sufficient good nodes in the DHT.

Another thought I had which would make searching easier/more efficient but could potentially increase the risk of these eclipses. Currently, we select a random node-id and troll through the closest nodes to this random id until we find enough that have an ENR that match a subnet. Instead, we could potentially do the following:

Instead of having it based on peer-id have it based on node-id (used in discovery can be calculated from ENR and most cases, peer-id). If we base the subnet a node should be subscribed to on the first 8 bits of the node-id (i.e the subnet number in binary) with the epoch rotating them in some way, then we can search for peers on a subnet by searching discovery for nodes closest to a node-id with the first 8 bits being the subnet-id we want. I think this would leverage the XOR metric in discovery and the query mechanism to find nodes based on this subnet more efficiently than randomly trolling the DHT (i.e we can get discovery to specifically find nodes on a subnet). I haven't fully thought through the draw backs in possible eclipsing however.

AgeManning avatar Dec 01 '21 00:12 AgeManning

One advantage of the current strategy is that while subnets are more concentrated, an attestation that is propagated there has a very high likelihood of making it into all the aggregates eventually broadcasted. This is simply for the fact that only nodes with active validators are going to be subscribed to a subnet. For attestations to be propagated to all nodes in the particular subnet requires a smaller number of hops.

Shifting this to now be dependant on a node's id would also bring in both staking and non-staking nodes to the current subnet. If we expect non-staking nodes to eventually be 10 times the amount of staking nodes this is a non-trivial increase. An increase in the number of hops would have an effect on whether a particular individual attestation can make it into the aggregate in time. This would definitely require a bit more involved testing/simulations to see how this would affect the overall participation of the network.

Things we would have to look out for: -> With the increased number of hops per subnet, how much longer would it take for an attestation to reach all the relevant aggregators in that subnet ? -> If attestations now take longer to be propagated across the network, would this change lead to more disjoint aggregates, and along with that a higher CPU load for the entire network as now more individual aggregates have to be verified ? -> How would this perform in situations such as late blocks ? If blocks are late, there is a chance that the particular attestation might never make it into an aggregate in time due to increased number of hops across the network and permanently go 'missing' .

Also on another note to add, the peer_id isn't perfectly random. Irregardless of what curve is used to create the underlying public key, the peer-id spec appends the function code and size of the key at the start of it: https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md#peer-ids . This can lead to the network being biased towards particular subnets. I would agree with @AgeManning that we should be using node-id here as we do for discovery in general. The node id in contrast is simply the keccak-256 hash of the uncompressed pubkey,

nisdas avatar Dec 01 '21 02:12 nisdas

One thing to note is that this will degrade on small-node-count networks (e.g. testnets and devnets).

To counteract this -- I propose adding a configuration value (only updated at network upgrades) called ESTIMATED_NETWORK_NODE_COUNT (or something...) along with MINIMUM_TARGET_SUBNET_BACKBONE which then dictates how many subnets a node must be on for honest participation.

So concretely:

  • ESTIMATED_NETWORK_NODE_COUNT -- e.g. 4500 on mainnet, 100 on prater, etc
  • MINIMUM_TARGET_SUBNET_BACKBONE -- e.g. 50 or whatever we deem as safe in some analysis
  • change compute_subnet_from_node_id to compute_subnets_from_node_id(node_id, epoch) -> List[uint64] -- thus returning a list of size max(1, MINIMUM_TARGET_SUBNET_BACKBONE // (ESTIMATED_NETWORK_NODE_COUNT // ATTESTATION_SUBNET_COUNT) up to a maximum of ATTESTATION_SUBNET_COUNT. This would ensure that the honest behaviour based on network node count estimates is to meet the minimum subnet backbone target. On mainnet this would likely be 1 but on small testnets it would ramp up

djrtwo avatar Dec 01 '21 15:12 djrtwo

An increase in the number of hops would have an effect on whether a particular individual attestation can make it into the aggregate in time.

fortunately gossip hops generally scale with log(N) where N is size of the network. In expectation, this likely won't affect delivery times in a well-structured/distributed network

I would agree with @AgeManning that we should be using node-id here as we do for discovery in general. The node id in contrast is simply the keccak-256 hash of the uncompressed pubkey,

Agreed

djrtwo avatar Dec 01 '21 15:12 djrtwo

To counteract this -- I propose adding a configuration value (only updated at network upgrades) called ESTIMATED_NETWORK_NODE_COUNT (or something...) along with MINIMUM_TARGET_SUBNET_BACKBONE which then dictates how many subnets a node must be on for honest participation.

Why not adjust ATTESTATION_SUBNET_COUNT accordingly?

mkalinin avatar Dec 02 '21 07:12 mkalinin

This means that we simultaneously reduce the over population of attnets by a small number of staking nodes

Is this such a terrible thing? Larger staking nodes are often highly specced, and will commonly have aggregating validators, so are both required to and capable of generating aggregate attestations in a timely and (rest of the network willing) relatively complete fashion. Given the makeup of aggregates in the average block at current, anything that would potentially reduce the effectiveness of aggregators would seem to be a retrograde step with potentially significant impact.

If nodes could still join more subnets, as per the current --subscribe-all-subnets flag or similar, it would avoid the situation of a large validating node potentially not subscribed to most attnets for which it is an aggregator (resulting in increased hops for traffic, higher latency, and associated Bad Things).

This would potentially weaken some of the ideas above, such as being able to generate the set of subnets a given node subscribes to given its node ID, but I think would be overall net positive.

mcdee avatar Dec 12 '21 22:12 mcdee

If nodes could still join more subnets, as per the current --subscribe-all-subnets flag or similar, it would avoid the situation of a large validating node potentially not subscribed to most attnets for which it is an aggregator (resulting in increased hops for traffic, higher latency, and associated Bad Things).

I think this is a client implementation detail. Clients should be fine to manage their peers for subnets they need ahead of time and if there's enough peers on the subnets should manage this gracefully. I would consider large validating node potentially not subscribed to most attnets for which it is an aggregator being a fault with the beacon node implementation.

As you've pointed out, there are potential downsides to changing the network structure, but its hard to estimate what these actually are in practice without large-scale tests. I think this change is valuable enough that we test how it performs on a devnet, then testnets etc.

Just to make it explicit, there is quite a large burden on large validating nodes. Being subscribed to every subnet all of the time defeats the purpose of having subnets at all. They have to validate and propagate everything. It could be beneficial having a smaller number of nodes on each subnet, but having each node less restricted by CPU/bandwidth.

In Lighthouse we'll keep the --subscribe-all-subnets flag, for users who are not constrainted by CPU/bandwidth, but I think it could potentially be nicer to not require this as a default.

AgeManning avatar Dec 13 '21 00:12 AgeManning

Agreed, Teku will also keep the --subscribe-all-subnets but it will require more and more resources to keep up with that as the number of validators grows.

Also given that validators are spread over 32 slots and 64 subnets, if you're running 2048 validators or more then you're probably subscribed to every subnet to support your aggregation duties anyway and that's probably true with even fewer validators since you'd likely subscribe a few slots before you need to aggregate.

For small numbers of validators this likely has a bigger impact and reduces the load. Currently for each of the first 64 or so validators you wind up subscribed to an additional subnet, plus a potentially temporary subscription to aggregate. Whereas with this change that burden is reduced significantly. Potentially that makes it more feasible for people to stake from home with a few validators rather than having to move to the cloud for bandwidth.

ajsutton avatar Dec 13 '21 00:12 ajsutton

but it will require more and more resources to keep up with that as the number of validators grows.

How so? There's a cap on the number of attestations / epoch in the system, and we've reached peak-attestation on mainnet already.

arnetheduck avatar Dec 14 '21 08:12 arnetheduck

How so? There's a cap on the number of attestations / epoch in the system, and we've reached peak-attestation on mainnet already.

Every validator attests every epoch so if you're subscribed to all individual subnets the more validators there are the more individual attestation gossip messages you have to handle. There is no peak-attestation, only peak-aggregate.

ajsutton avatar Dec 14 '21 09:12 ajsutton

Right, I got the two confused it seems.

arnetheduck avatar Dec 14 '21 10:12 arnetheduck

This idea can be extended to sync committee subnets too. But with some mod such that only 1 every N validators have to subscribe.

dapplion avatar Dec 15 '21 11:12 dapplion

Subnets entry points can be found in a relatively short time (< 1 minute) and short search overhead (within a few hops of the DHT)

@djrtwo not actually sure I understand how DHT could be utilized here (with just discV5 in mind)? From my perspective finding entry points would only be possible by iterating known nodes and checking that function applied to a nodeId results in the needed subnet.

Nashatyrev avatar Dec 24 '21 11:12 Nashatyrev

I'd recommend checking existing nodes but going to the dht (discv5) to find appropriate peers would be the backup when existing/known peers dont meet the requirement.

That said, itd be good to pre-populate known peers with a diverse set of attnets

I might be confused by your comment @ant, can you elaborate?

djrtwo avatar Dec 25 '21 00:12 djrtwo

Instead of having it based on peer-id have it based on node-id (used in discovery can be calculated from ENR and most cases, peer-id). If we base the subnet a node should be subscribed to on the first 8 bits of the node-id (i.e the subnet number in binary) with the epoch rotating them in some way, then we can search for peers on a subnet by searching discovery for nodes closest to a node-id with the first 8 bits being the subnet-id we want. I think this would leverage the XOR metric in discovery and the query mechanism to find nodes based on this subnet more efficiently than randomly trolling the DHT (i.e we can get discovery to specifically find nodes on a subnet). I haven't fully thought through the draw backs in possible eclipsing however.

I think this is something worth considering and is a way to use the discv5 DHT (or rather the kademlia XOR metric) to find nodes on subnets faster. i.e minimize the search overhead.

AgeManning avatar Jan 10 '22 00:01 AgeManning

Yeah, that makes sense to me. Is there any reason peer-id is beneficial over node-id?

djrtwo avatar Jan 11 '22 14:01 djrtwo

I'd like to test and model this.

Anyone have the bandwidth to do either?

djrtwo avatar Jan 11 '22 14:01 djrtwo

I can write up a simulation at some point, verifying the idea suggested above saves us discovery hops when traversing the DHT comparted to randomly searching. I'll post results here.

I imagine we probably need to adjust our peer management a bit. If each node we connect to is only subscribed to one long-lived subnet, we might want to increase our default peer count to handle all 64 subnets and not have too much peer churn.

We probably need to agree on a function perhaps something like:

def compute_subnet_from_node_id(node_id, epoch):
    return ((id >> 250) + epoch // SUBNET_DURATION_IN_EPOCHS) % ATTESTATION_SUBNET_COUNT

Where the node_id is 32 byte node_id and SUBNET_DURATION is like 288 (1 day approx). The 250 gives us the first 6 bits but is dependant on the size of ATTESTATION_SUBNET_COUNT.

Anyone, I"ll start a branch on Lighthouse, so make lighthouse ready if someone has bandwidth to start a testnet.

AgeManning avatar Jan 12 '22 02:01 AgeManning

I can write up a simulation at some point, verifying the idea suggested above saves us discovery hops when traversing the DHT comparted to randomly searching. I'll post results here.

Was doing some experiments with discv5 recently (live not simulation), and DHT searching is indeed much more effective than a random. Don't have exact numbers, but finding all known nodes with id starting with a 8-bit prefix takes about 10 seconds, while iterating over all nodes globally takes minutes of intensive discovery message exchange

Nashatyrev avatar Jan 14 '22 09:01 Nashatyrev

Just run a number of experiments with DHT searching for nodes with 8-bit prefix:

  • The majority of live nodes is normally found withing 3-4 hops. This could fit in 1 second
  • All peers (with prefix) are normally found within 100 requests. When doing in parallel it takes around 5-10sec

Nashatyrev avatar Jan 14 '22 10:01 Nashatyrev