lodestar
lodestar copied to clipboard
Proposal for reorganizing shuffling code
Problem description
Currently we are calculating the shuffling for the next epoch during the state transition of the current epoch.
https://github.com/ChainSafe/lodestar/blob/00dfa63413e9b2a57dbaea70ac7151ef134b0c05/packages/state-transition/src/cache/epochCache.ts#L513-517
This takes about a second.
We propose to offload that work to idle time not on critical path during epoch transition which should shrink the transition duration by about a second.
The nextShuffling
is not necessary for epoch transition but is useful for calculating the next epoch proposers and sync committee duties so pre-computation is still recommended but it just doesnt need to be on critical path.
Solution description
Currently the shuffling code is housed in state-transition
and the shuffling results are available, on the EpochCache
. We propose to move that all to the ShufflingCache
in `beacon-node.
This will facilitate a couple of things. We will still have access to the results for service via API, the current consumers of the nextShuffling
and the shuffling results can be passed into the state-transition pre-computed. It will also open up the ability to do the shuffling on a separate thread or async in main thread idle times.
As a result of moving the shuffling to beacon-node the getBeaconProposersNextEpoch
and similar getters would also need to be moved out of state-transition as they rely on the nextShuffling that will only be available in beacon-node after a period of time (so they cannot be precomputed until the nextShuffling is finished).
I am looking either moving those onto the ShufflingCache but that feels like mixing of concerns so potentially I will create a PrecomputationCache
or similar to house that type of convenience information that gets served by the API.
Additional context
What are everyones thoughts on the approach and process?
Step 1) refactor code to move shuffling fully out of state-transition
and into beacon-node
. This will not change epoch transition times but will set the groundwork for the next steps
Step 2) create async processing code to move shuffling off critical path
Step 3) potentially create worker for precomputation tasks and implement with napi-nice to make sure kernel scheduling prioritizes main, network and bls threads
yes I strongly support the idea because having shuffling inside both EpochCache and ShufflingCache is not great. Some more contexts:
- the cache of shuffling inside state-transition is not for itself, it's only for beacon node to consume later
- it does not look right for beacon-node to call
state-transition
for a cache,state-transition
should be for state transition only, beacon-node should use theShufflingCache
-
n-historical state
work would limit the states due to its size but we can cache way more EpochShuffling because it's so lightweight - now when loading state from disk the api has to provide a ShufflingGetter function. It should be able to load state without it
@matthewkeil this approach sounds good to me
- related https://github.com/ChainSafe/lodestar/issues/6268
~~it's not possible to get rid of shuffling inside CachedBeaconState, it's needed in at least 2 places~~:
- upgradeStateToAltair()
- processSyncCommitteeUpdates()
~~this is required in the processSlots()
where it spans through multiple epoch transitions~~
update: we don't need epoch shuffling for the above apis as long as we cache previous/current/next active validator indices
- upgradeStateToAltair: can rebuild the committees/shuffling based on active validator indices and cached randao mixes inside the state
- processSyncCommitteeUpdates: it only needs next epoch validator indices which is available after
beforeProcessEpoch
- but for
processAttestations
we need it https://github.com/ChainSafe/lodestar/blob/3076b4cbe34bbe4ebadb005f78f90c256ab31aae/packages/state-transition/src/block/processAttestationsAltair.ts#L52