lighthouse
lighthouse copied to clipboard
Reduce frequency of polling unknown validators to avoid overwhelming the Beacon Node
Issue Addressed
Addresses #4388.
A large number of unknown validators on a VC is known to overwhelm the beacon node because the VC triggers a retrieval for each validator per slot. This PR reduces the frequency to one query per unknown validator each epoch.
Proposed Changes
- Poll for all validator indices on startup (unchanged)
- If any validator is unknown, register to poll again in the 32 slots (1 epoch) instead of the next slot.
- Avoid polling on the first 1st slot of epoch.
For more details and rationale, see comment here.
I'll do a bit of manual testing next week before we merge this!
@chong-he found an issue during testing and it looks like the VC re-queries the BN during the first slot of the epoch, I'll look into this.
Upon investigating the logs, I think it is working as intended, and the poll in the first slot of epoch is skipped. Will wait for CK to confirm.
However, I think there's likelihood that this could happen due to the async nature of this function, in the scenario where BN returns a response really late - basically if iterating through all the validators takes close to 12 seconds, then we could be querying in a new slot, which could potentially be first slot of a new epoch.
This is because of the calculation outside the for loop: https://github.com/sigp/lighthouse/blob/ce914d191d804a765c573843c3c7448facd04ccb/validator_client/src/duties_service.rs#L483-L502
For more accuracy, I'll move the calculation into the for loop given we await on each validator query and the slot status could be inaccurate if we have a large list or slow BN.
@chong-he found an issue during testing and it looks like the VC re-queries the BN during the first slot of the epoch, I'll look into this.
Apologies! I was looking at the wrong slot time.
The VC does not re-query on the first slot in the next epoch (even though we start the VC in the first slot). It waits until the 12s has lapsed and only re-query the BN about the status (i.e., query at the second slot of an epoch). The VC also only query once per epoch, which is a great improvement.
All good upon testing, this PR is good to go
sorry looks like cargo fmt is failing because I messed up the indentation in my suggestion
My bad, should've checked myself :P thanks!
@mergifyio queue
queue
✅ The pull request has been merged automatically
The pull request has been merged automatically at 52e31121df5ba74442d37ed9ec1812ee0bdad44a