consensus-specs icon indicating copy to clipboard operation
consensus-specs copied to clipboard

EIP-7594: Decouple network subnets from das-core

Open ppopth opened this issue 1 year ago • 15 comments

Currently we use subnets as a unit of custody in the PeerDAS core protocol because it doesn't make sense to partially custody only some columns in the subnets and waste the bandwidth to download the columns the node doesn't custody.

Since subnets correspond to GossipSub topics which are in a layer lower than the core protocol, using subnets as a unit of custody makes the core layer and the network layer too coupled to each other and leave no room for the network layer flexibility.

This commit introduces "custody groups" which are used as a unit of custody instead of subnets.

The immediate benefit of the decoupling is that we can immediately increase the number of subnets without affecting the expected number of peers to cover all columns and affecting the network stability and without touching the core protocol.

The reason we want to increase the number of subnets to match the number of columns is that the columns will be propagated through the network faster when they have their own subnets. Just like EIP-4844, each blob has its own subnet because, if all the blobs are in a single subnet, the blobs will be propagated more slowly.

Since we keep the number of custody groups the same as the previous number of subnets (32), the expected number of peers you need to cover all the columns is not changed. In fact, you need only NUMBER_OF_COLUMNS and NUMBER_OF_CUSTODY_GROUPS to analyze the expected number, which makes the core protocol completely decoupled from the network layer.

ppopth avatar Jul 03 '24 08:07 ppopth

I haven't done a detailed review of how you do the layering here, but I generally support this direction to ensure that core protocol and network are sufficiently decoupled so they can be tuned/optimized independently

djrtwo avatar Jul 03 '24 16:07 djrtwo

I think this is a good proposal. I believe it has something in common with former 'Network Shards' idea.

However it's worth to keep in mind that in this case the gossip 'backbone' is defined rather implicitly (by the upper das core layer) than explicitly. Changing for example NUMBER_OF_CUSTODY_GROUPS to 128 would affect not only the upper das core layer but would also affect the network layer by making subnet backbones too small and potentially unreliable/vulnerable.

Nashatyrev avatar Jul 04 '24 14:07 Nashatyrev

I like the idea of decoupling the concepts related to custody allocation from the delivery method.

My concern is that the only reason I know of for grouping columns in custody is the (supposedly) inefficiency of gossipsub (implementations) in handling a larger number of topics and having smaller topics. Anyway, I think the conceptual decoupling is good to have.

Another thing I see in the PR is that you also have FAQ extensions and alike. Haven't seen the details yet, I will try to give it a review later on.

cskiraly avatar Jul 09 '24 09:07 cskiraly

One more thing: parameter changes are also proposed in https://github.com/ethereum/consensus-specs/pull/3779 That one is proposing to change DATA_COLUMN_SIDECAR_SUBNET_COUNT to 128

cskiraly avatar Jul 09 '24 09:07 cskiraly

Changing for example NUMBER_OF_CUSTODY_GROUPS to 128 would affect not only the upper das core layer but would also affect the network layer by making subnet backbones too small and potentially unreliable/vulnerable.

From another point of view, that can be seen as a non-networking issue. Let me rephrase your issue. If the number of custody groups is too large, the number of nodes in each group will be so small that you cannot find the peers to cover all the custody groups.

I think whether it's an issue on the network layer is quite subjective. However, the whole point of doing DAS is to reduce the bandwidth consumption, so every part of the protocol (including the core one) can be seen as related to the network in some way.

ppopth avatar Jul 09 '24 10:07 ppopth

Another way to decouple the GossipSub topics from the core protocol is not to correspond subnets to GossipSub topics, but think of subnets as groups of nodes. However, we use the term "subnets" to mean GossipSub topics a lot in the past, so it's hard to think of subnets as anything else.

ppopth avatar Jul 09 '24 10:07 ppopth

One more thing: parameter changes are also proposed in #3779 That one is proposing to change DATA_COLUMN_SIDECAR_SUBNET_COUNT to 128

I have a big concern on that change. Let's discuss it in that PR.

ppopth avatar Jul 16 '24 14:07 ppopth

I miss the method which translates custody group into subnet, I think we need it or I don't understand an idea a bit. So instead of this

def compute_subnet_for_data_column_sidecar(column_index: ColumnIndex) -> SubnetID:
    return SubnetID(column_index % DATA_COLUMN_SIDECAR_SUBNET_COUNT)

we should have something like

def compute_subnet_for_custody_group(custody_group: UInt64) -> SubnetID:
    assert custody_group < NUMBER_OF_CUSTODY_GROUPS
    return SubnetID(custody_group % DATA_COLUMN_SIDECAR_SUBNET_COUNT)

And also it would be good to have compile assertion that NUMBER_OF_CUSTODY_GROUPS is set to the number, which could be always translated to DATA_COLUMN_SIDECAR_SUBNET_COUNT (say, we don't have 32 groups and 64 subnets or 33 groups and 32 subnets)

zilm13 avatar Jul 18 '24 19:07 zilm13

I miss the method which translates custody group into subnet, I think we need it or I don't understand an idea a bit. So instead of this

The idea is that you can get the subnets by calling compute_columns_for_custody_group and then compute_subnet_for_data_column_sidecar. Does that make sense?

We don't want to do compute_subnet_for_custody_group because it will allows only one subnet for each custody group.

And also it would be good to have compile assertion that NUMBER_OF_CUSTODY_GROUPS is set to the number, which could be always translated to DATA_COLUMN_SIDECAR_SUBNET_COUNT (say, we don't have 32 groups and 64 subnets or 33 groups and 32 subnets)

What do you mean? NUMBER_OF_CUSTODY_GROUPS is independent from DATA_COLUMN_SIDECAR_SUBNET_COUNT and only the assertion that I can think of is that DATA_COLUMN_SIDECAR_SUBNET_COUNT should be greater than or equal to NUMBER_OF_CUSTODY_GROUPS.

we don't have 32 groups and 64 subnets

Absolutely, we can have 32 groups and 64 subnets.

ppopth avatar Jul 25 '24 04:07 ppopth

The idea is that you can get the subnets by calling compute_columns_for_custody_group and then compute_subnet_for_data_column_sidecar

@ppopth now I got this, thank you for the explanation

zilm13 avatar Jul 30 '24 08:07 zilm13

I think this is ready to be merged?

ppopth avatar Aug 14 '24 13:08 ppopth

I know this proposal has been around for a while, and I actually reviewed it earlier, but I have some additional thoughts.

I’m trying to wrap my head around the benefits of this proposal. It looks like the idea is to allow us to tweak the subnet count without touching the core protocol (custody_requirement) and its security aspects, which makes sense given where PeerDAS is right now. But I’m wondering if the benefits we gain from this would really justify the added complexity, and whether this change is something we truly need at this stage.

Personally, I’m not entirely convinced that this change is necessary. The more we decouple things, the more complexity we introduce, which might not be ideal. Since we're already looking at up to 128 subnets, it feels like we won’t need to go beyond that in the near future.

I’ve heard that research shows it can be more efficient to spread larger chunks of data across multiple subnets rather than sticking to just one. So, if we end up increasing the blob count significantly, I can see how increasing the subnet count might make sense. But for now, I’m not sure there’s a big advantage in going above 128 subnets unless we’re dealing with a much higher blob count. Even with 256 blobs per block, we’re still only sending up to 512kb per block to each subnet.

Maybe there’s something I’m missing here and about the future plans - Is there an issue with the current spec, or are we just trying to future-proof for scenarios we’re not certain about yet?

jimmygchen avatar Aug 21 '24 02:08 jimmygchen

@jimmygchen Sorry for the late reply.

I would like to compare this PR to attestation subnets and attestation committees. In the phase0 spec, we decoupled subnets from committees as well. In real life and as I always heard, no one really mentions committees. People always talk about attestation subnets. In fact, according to the spec, the phase0 core protocol has nothing to do with subnets. They only deal with committees. The subnets are in the network layer, another layer.

So in PeerDAS, people may still discuss things in term of subnets (just like the attestation subnets), but I think it doesn't add more complexity since phase0 is already complex (according to your standard, some people may say it's not complex).

ppopth avatar Sep 10 '24 14:09 ppopth

Thanks for the context @ppopth.

It's a fair comparison with some differences. Attestations have a target committee size, while with DAS, we’ve got a fixed number of columns/groups. And repeating my earlier comment, I'm not sure we need to design this with increasing to over 128 subnets in mind at this stage, or if there's any reason to reduce the number of subnets after shipping PeerDAS.

I feel there might be some tension, but I just wanted to pass along our team's feedback and keep things moving since we talked about scheduling this for peerdas-devnet-3. I’m not saying the proposal is overly complex, just that it adds more complexity than not having it. But even compared to attestation committees, I’m not sure the benefits are clear right now. It feels less urgent compared to other priorities like validator custody or optimizing cell proof computation and bandwidth. That’s just my take - I just wanted to share our feedback. It’s not something I’m going to fight hard on.

jimmygchen avatar Sep 11 '24 03:09 jimmygchen

@jimmygchen Thank you for the feedback.

I'm not sure we need to design this with increasing to over 128 subnets in mind at this stage, or if there's any reason to reduce the number of subnets after shipping PeerDAS.

I think it's reasonable to make the spec agile to changes even if we don't know clearly yet if we want to change the number of subnets or not. Again, just like attestation subnets, the number of subnets and the number of committees have been both 64 for a long time and we don't see whether this will ever change.

That reason also applies here. The number of column subnets will probably be equal to the number of custody groups forever, but it has value in the view of agility.

What do you think?

ppopth avatar Sep 11 '24 16:09 ppopth

It seems that everyone in this PR is already convinced that this PR is useful. So it means it's ready to be merged?

ppopth avatar Oct 29 '24 10:10 ppopth