Store pubkey cache decompressed on disk
Description
Currently Lighthouse takes a long time to start up while it reloads the public key cache from disk.
The fundamental reason for this is that it stores the BLS pubkeys in their compressed form and then has to decompress every one of them for the in-memory cache:
https://github.com/sigp/lighthouse/blob/18c61a5e8be3e54226a86a69b96f8f4f7fd790e4/beacon_node/beacon_chain/src/validator_pubkey_cache.rs#L166-L169
Just changing that PublicKeyBytes to PublicKey would not be sufficient, because the two types share the same compressed SSZ representation:
https://github.com/sigp/lighthouse/blob/18c61a5e8be3e54226a86a69b96f8f4f7fd790e4/crypto/bls/src/macros.rs#L29-L50
https://github.com/sigp/lighthouse/blob/18c61a5e8be3e54226a86a69b96f8f4f7fd790e4/crypto/bls/src/impls/blst.rs#L121-L124
So we would need to create new methods on bls::GenericPublicKey like serialize_uncompressed and deserialize_uncompressed which use the 96-byte serialization of the PublicKey rather than the 48-byte compressed serialization. For blst, we can use the PublicKey::serialize function. For milagro we can use PublicKey::as_uncompressed_bytes. Both libraries also include deserialisation functions acting on the uncompressed bytes.
Once these methods are in place in the bls wrapper lib, we can call them from the DatabasePubkey wrapper. We'll need two versions of DatabasePubkey in order to implement the schema migration from compressed keys to uncompressed keys.
By storing the bytes uncompressed we'll double the disk space required for the pubkey cache from 500k * 48 bytes = 24 MB to 48 MB. Given the overall size of the database I believe this is an acceptable trade-off. We could attempt to apply a faster compression algorithm like zstd to the data, but seeing as the input is pseudo-random bytes I suspect this naive compression would be ineffective.
Version
Lighthouse v3.0.0
I've decided to work on withdrawals for now and I'll come back to this if someone doesn't hop on it :)
I'm making some changes to the on-disk pubkey cache in my tree-states PR, so it may make sense to roll this schema upgrade in there at the same time. Do you mind if I pick this one up @ethDreamer? 🙏
Yeah for sure 😊
Checking in on this - has it been put on the backburner for a while?
~~@jclapis It's fixed in unstable! (or should be)~~
~~We're just waiting on the new year (and a few more features) to ship v3.4.0~~
My bad, I was thinking of the ValidatorPubkeyCache lock timeouts!
This is implemented and working on the tree-states branch. We're planning an alpha or beta release of that branch soon, but it touches everything so we're being cautious. We're also doing a bunch of database cleanups all in one go, so we want to make sure it's perfect so we don't need to do it multiple times.
Resolved by:
- https://github.com/sigp/lighthouse/pull/5897
FINALLY!
This will be in v5.3.0, but anyone who wants early access can run unstable (it will migrate your DB to v21, but you can downgrade back to v19 if you want to run LH 5.2.1)