lighthouse
lighthouse copied to clipboard
Signature caching in the VC
Description
@jmcruz1983 (Juan) has pointed out that Lighthouse is doing orders of magnitude more signing requests to Web3Signer than Teku. At scale (e.g., thousands of validators), this can overload infrastructure and cause real problems.
Based on some data from Juan, I suspect this is caused by duplicate signing of selection proofs (I'm not sure if it's for attestations or sync messages).
I have a proposed solution that should be simple to implement and will:
- Definitely be useful for Juan to test on his infra to see if it resolves the issue.
- Probably be a tenable long-term solution.
Proposed Solution
In the validator_store, create a signature_cache: SignatureCache<T>(HashMap<(T, SigningContext), Signature>) struct.
- Attached to the
ValidatorStoreis onesignature_cache(probably wrapped in anRwLock) forproduce_selection_proofand one forproduce_sync_selection_proof. - When a selection proof is requested, we check the cache to see if it already exists. If so, we return early with that signature.
- After we create a signature (because it wasn't in the cache), we add it to the cache.
- If, during the cache add, we discover that the cache is over a certain size (64?) then we prune the entry with the lowest slot.
Whenever the signature cache reaches a certain size (4?) it will prune the entry
Oh wait, I just realised that those signature_caches need to be per validator. Perhaps attaching them to the SigningMethod would be more appropriate.
We already pre-compute all the sync selection proof signatures. It could be that this burst of signing is what shows up on web3signer's end.
Or, alternatively if we do adopt a cache we can probably drop the signature pre-compute, as I think a cache would obsolete the pre-compute.
Closing since #3223 implemented this feature and saw little to no benefit.