bee icon indicating copy to clipboard operation
bee copied to clipboard

feat: kademlia full bins

Open istae opened this issue 2 years ago • 5 comments

Checklist

  • [x] I have read the coding guide
  • [ ] My change requires a documentation update and I have done it
  • [x] I have added tests to cover my changes.

Description

Please see https://github.com/ethersphere/bee/issues/3150

Motivation and context (Optional)

Related Issue (Optional)

Screenshots (if appropriate):


This change is Reviewable

istae avatar Aug 02 '22 18:08 istae

So I think this change was already discussed sometime back. I don't think this is a silver bullet. There is saturation and oversaturation and they mean different things. I feel there has to be some difference in the oversaturation and saturation as this would allow node to connect to new nodes. If node is already oversaturated it would reject the new connection no? Apart from bootnodes. I am not sure if this is what we want.

I think this is a research topic to determine the saturation and oversaturation count. We would need to observe the network after this change to see if it affects things. But I would like to get an input from research team regardless.

See https://github.com/ethersphere/bee/issues/3150 for more details.

We use saturation for depth calculation and oversaturation for maximum allowed peers in a bin outside of depth. There is no reason for bin 0 to have peers below 20, so kademlia should be more aggressive with connections.

If node is already oversaturated it would reject the new connection no?

Yes, but, someone else would make an inbound connection to the new node. It balances out. The direction of connections does not matter, essentially.

istae avatar Aug 03 '22 08:08 istae

Also, the previous change, which increased the suffix length for the balance connector, is a good change but does not make the bin reach oversaturation directly.

istae avatar Aug 03 '22 08:08 istae

So I think this change was already discussed sometime back. I don't think this is a silver bullet. There is saturation and oversaturation and they mean different things. I feel there has to be some difference in the oversaturation and saturation as this would allow node to connect to new nodes. If node is already oversaturated it would reject the new connection no? Apart from bootnodes. I am not sure if this is what we want. I think this is a research topic to determine the saturation and oversaturation count. We would need to observe the network after this change to see if it affects things. But I would like to get an input from research team regardless.

See #3150 for more details.

We use saturation for depth calculation and oversaturation for maximum allowed peers in a bin outside of depth. There is no reason for bin 0 to have peers below 20, so kademlia should be more aggressive with connections.

If node is already oversaturated it would reject the new connection no?

The issue itself does not give enough reason for this increase. There are many problems with adding more connections. Lot more resources, file descriptors etc. This needs to be justified with improvement from all the other places. I dont see this change as simply changing values. If we add peers there are more failures, more retries etc. Also, I am pretty sure this was discussed earlier and we concluded this is not the right change. Not sure what has changed?

Currently, each peer has around 130 connections on the mainnet. I dont think this is a problem right now. If it is, I see it as a research problem to figure out what is the right number we should be at. If we think it is higher, we can always change this value.

Yes, but, someone else would make an inbound connection to the new node. It balances out. The direction of connections does not matter, essentially.

These assumptions may not be correct. This is why this is a research topic.

aloknerurkar avatar Aug 03 '22 09:08 aloknerurkar

There are many problems with adding more connections.

I am not proposing an increase in maximum number of connections. If you can remember to prior to like a few months ago when the network had more activity (eg new nodes connecting, old nodes rebooting/dropping), our bins were almost always oversaturated. Because the current state is that the network is static, nodes are not reaching 20 connections per bin.

This needs to be justified with improvement from all the other places. If we add peers there are more failures, more retries etc.

Theoretically, a more connected network allow for fewer hops for the protocols, so less error rate, fewer retries.

istae avatar Aug 03 '22 09:08 istae

There are many problems with adding more connections.

I am not proposing an increase in maximum number of connections. If you can remember to prior to like a few months ago when the network had more activity (eg new nodes connecting, old nodes rebooting/dropping), our bins were almost always oversaturated. Because the current state is that the network is static, nodes are not reaching 20 connections per bin.

So if kademlia aggressively tries to connect to more peers this would increase the connections, right? If there was more activity on the network, they would have more peers in their bins. But right now they might not need to connect to more nodes.

This needs to be justified with improvement from all the other places. If we add peers there are more failures, more retries etc.

Theoretically, a more connected network allow for fewer hops for the protocols, so less error rate, fewer retries.

So theoretically, even with saturationPeers we should have enough connectivity to satisfy the protocols and even then we see a lot of errors/timeouts/stream resets. I also don't see allowing more hops as a bad idea. This would be good testing for when the network would be even bigger.

I definitely agree this can be calibrated, but then we should get a number that should be right, and then we need to test the behavior on a large cluster, constantly monitoring the resource, connections, protocols metrics etc. Before doing all this, I think its a good idea to take input from the research team. This would be my suggestion.

Considering all the breaking changes and things we have planned, it would be better if we do this later when we have the correctness sorted.

aloknerurkar avatar Aug 03 '22 11:08 aloknerurkar