Cap validator count at 1m with sortition
One of the weaknesses of the current spec is the very high variance in client load: clients need to be designed to potentially handle an extremely high load supporting 4 million validators, but real-life load is likely to be much lower (expected to be 100x lower close to launch). This means that validator operators must either get a powerful computer "just in case" or run the risk of being unable to keep up if far more validators join then expected.
This post is one possible solution to this conundrum, by capping the number of validators to 1 million (actually 2**20) in a randomized and fair (and hence unexploitable) way.
Outline
Add a "dormant" state (in addition to "awaiting activation" / "activated" / "exited" / "withdrawn"); this would be done either by adding a sleep_epoch and wake_epoch or by adding a dormant:bool and dormancy_transition_epoch (we need to store it as epochs to preserve the invariant that all state changes are predictable by 4 epochs). Dormant validators can skip the queue to exit.
If the total number of active validators at the current time exceeds 2**20, then add the following rules:
- Validators that are activated via the activation queue are instead moved into the dormant state.
- Let N be the number of validators that normally would be activated via the activation queue mechanism assuming
2**20active validators (currently that's 16). At the end of each epoch, N randomly selected dormant validators are activated, and N randomly selected active validators are made dormant.
The random selection is important, because it ensures that (i) an attacker cannot join with new validators and replace existing participants without being equally diluted themselves, and (ii) there is no benefit from exiting-then-reentering.
Economic effects
In the case where there are more than 2**20 validators, this proposal would have two sets of consequences to validators. First, per-validator rewards would drop by 1% per 1% gain in staking participation (instead of the status quo: 0.5% drop per 1% gain in participation). Second, validators' costs would go down, because validators would be offline some of the time, and validators would have more freedom to remove some of their funds reliably, reducing implied capital costs.
Note that there transition between the under-the-cap regime and the at-the-cap regime is gradual, because if the number of total participating validators is only slightly above 2**20 then any validator that is forced into dormancy can expect to be woken back up very quickly.
Possible extensions
- Use
2**19(524288 validators, ~16.7m ETH) as the cap instead of2**20. - We can make the rotation happen faster by rotating a fixed percentage of validators (eg. 1/64) every time the chain finalizes. This allows us to rotate validators quickly without violating BFT set intersection invariants that would cause a reduction in safety.
Simulation code
Here is some quick simulation code that shows what happens if there are 100 active validators and 50 new ones join, assuming a cap of 100. The distribution quickly stabilizes into the optimal (67 old, 33 new).
import random
active = list(range(100))
dormant = []
for i in range(200):
if i < 50:
dormant.append(100+ i)
dormant.append(active.pop(random.randrange(len(active))))
active.append(dormant.pop(random.randrange(len(dormant))))
print("Active: {} original {} new".format(
len([x for x in active if x < 100]),
len([x for x in active if x >= 100])
))
Generally agreed. I think this is sensible to get into the proposed "first" fork that might include adjusting punitiveness, light client support, etc.
There are a few places of concern wrt load as the validator set size grows. One is memory (which this proposal does not solve), but the other is attestation load manifesting as high bandwidth and verification on the wire. Additionally, high attestation load might translate into more issues packing non-optimal aggregates into blocks.