bee icon indicating copy to clipboard operation
bee copied to clipboard

Auto-Neighborhood Balancing Feature

Open NoahMaizels opened this issue 10 months ago • 6 comments

Summary

Currently, when a node joins the network, it uses the Swarmscan neighborhood selector tool as the default value for neighborhood-suggester to choose a neighborhood based on choosing from among the least populated neighborhoods. This is great for the initial provisioning of nodes, but over time as neighborhood populations shift, it may end up that the neighborhood the node finds itself in is no longer ideal.

Currently, we recommend node operators to periodically check their node's neighborhood and hop to a new neighborhood if needed. However I am doubtful of how many node operators are actually following this practice (I just posted a poll on this matter in Discord, if it turns out my assumption is wrong, then maybe this isn't as much of an issue as I thought).

I believe that periodic automatic neighborhood balancing should be a feature which is on by default, with a sensible default configuration controlling when to hop or not according to current neighborhood population and current distribution of nodes in neighborhoods throughout the network.

Potential options with some sensible default values included:

neighborhood-rebalancing: <bool> # Defaults to `true`
neighborhood-rebalancing-interval: <int> # Days to wait until rechecking if a neighborhood hop is needed - default of 15?
max-neighbors: <int> # The maximum number of neighbors allowed. If exceeded, the `neighborhood-suggester` is used to hop to a new neighborhood. 

The max-neighbors option perhaps should not be included rather we should use the results from the Swarmscan /network/neighborhoods endpoint to calculate a reasonable number to use when checking whether to hop?

Motivation

There are several good reasons for this feature:

  1. Easier UX for node operators - it allows them to automatically ensure their nodes will always move to the most profitable neighborhoods
  2. Improves safety for data on the network - ensures that no neighborhoods are allowed to become underpopulated

Implementation

I don't have the ability to implement it.

Drawbacks

  • Defaults for neighborhood-rebalancing-interval and max-neighbors need to be carefully selected since each hop will cause an entire re-syncing of a new neighborhood and increase in bandwidth consumption, which might have unintended negative effects on the network.
  • There may be issues related to nodes simultaneously hopping without knowledge of hopping of other nodes, leading to unintended consequences
  • There is a 2 round delay between each hop during which the node will not be eligible to participate in redistribution

NoahMaizels avatar Feb 05 '25 22:02 NoahMaizels

There would need to be some mechanism to prevent to many nodes switching at the same time - and possibly therefore also "landing" in the same neighbourhoods.

crtahlin avatar Feb 07 '25 16:02 crtahlin

There would need to be some mechanism to prevent to many nodes switching at the same time - and possibly therefore also "landing" in the same neighbourhoods.

Perhaps something like by default assigning a random interval for when nodes check whether to hop, like an interval randomly between 20 to 40 days or something like that?

NoahMaizels avatar Feb 13 '25 07:02 NoahMaizels

There would need to be some mechanism to prevent to many nodes switching at the same time - and possibly therefore also "landing" in the same neighbourhoods.

Perhaps something like by default assigning a random interval for when nodes check whether to hop, like an interval randomly between 20 to 40 days or something like that?

That would probably work, while still keeping things decentralised.

crtahlin avatar Feb 13 '25 08:02 crtahlin

A solution to picking a least populated neighborhood can be that the peer will poll the status of each peer using the status protocol, and using the neighborhood size field, will pick the smallest neighborhood, mine a new overlay. The node might not right away find the least populated neighborhood out of the whole network, but it will get close. And it might get closer and closer with each migration. The interval to re-migrate should be a random value between some number of days, say 15-30.

Another tricky thing to get right is to, once the new neighborhood is found, restart the node and all of it's services without user intervention, and correctly pruning the stored data that will be resynced in the new neighborhood.

istae avatar Feb 25 '25 14:02 istae

There would need to be some mechanism to prevent to many nodes switching at the same time - and possibly therefore also "landing" in the same neighbourhoods.

maybe even worse situation when nodes are hopping from the same neighborhood at the same time and the data will be lost within that neighborhood.

nugaon avatar Feb 26 '25 10:02 nugaon

https://github.com/ethersphere/bee/issues/4923 same issue

istae avatar Mar 11 '25 13:03 istae