lodestar icon indicating copy to clipboard operation
lodestar copied to clipboard

Proposal for reorganizing shuffling code

Open matthewkeil opened this issue 1 year ago • 4 comments

Problem description

Currently we are calculating the shuffling for the next epoch during the state transition of the current epoch.

https://github.com/ChainSafe/lodestar/blob/00dfa63413e9b2a57dbaea70ac7151ef134b0c05/packages/state-transition/src/cache/epochCache.ts#L513-517

This takes about a second.

Screenshot 2024-02-02 at 11 13 35 AM (2)

We propose to offload that work to idle time not on critical path during epoch transition which should shrink the transition duration by about a second.

Screenshot 2024-02-02 at 11 14 00 AM (2)

The nextShuffling is not necessary for epoch transition but is useful for calculating the next epoch proposers and sync committee duties so pre-computation is still recommended but it just doesnt need to be on critical path.

Solution description

Currently the shuffling code is housed in state-transition and the shuffling results are available, on the EpochCache. We propose to move that all to the ShufflingCache in `beacon-node.

This will facilitate a couple of things. We will still have access to the results for service via API, the current consumers of the nextShuffling and the shuffling results can be passed into the state-transition pre-computed. It will also open up the ability to do the shuffling on a separate thread or async in main thread idle times.

As a result of moving the shuffling to beacon-node the getBeaconProposersNextEpoch and similar getters would also need to be moved out of state-transition as they rely on the nextShuffling that will only be available in beacon-node after a period of time (so they cannot be precomputed until the nextShuffling is finished).

I am looking either moving those onto the ShufflingCache but that feels like mixing of concerns so potentially I will create a PrecomputationCache or similar to house that type of convenience information that gets served by the API.

Additional context

What are everyones thoughts on the approach and process?

Step 1) refactor code to move shuffling fully out of state-transition and into beacon-node. This will not change epoch transition times but will set the groundwork for the next steps Step 2) create async processing code to move shuffling off critical path Step 3) potentially create worker for precomputation tasks and implement with napi-nice to make sure kernel scheduling prioritizes main, network and bls threads

matthewkeil avatar Feb 02 '24 04:02 matthewkeil

yes I strongly support the idea because having shuffling inside both EpochCache and ShufflingCache is not great. Some more contexts:

  • the cache of shuffling inside state-transition is not for itself, it's only for beacon node to consume later
  • it does not look right for beacon-node to call state-transition for a cache, state-transition should be for state transition only, beacon-node should use the ShufflingCache
  • n-historical state work would limit the states due to its size but we can cache way more EpochShuffling because it's so lightweight
  • now when loading state from disk the api has to provide a ShufflingGetter function. It should be able to load state without it

twoeths avatar Feb 02 '24 09:02 twoeths

@matthewkeil this approach sounds good to me

wemeetagain avatar Feb 05 '24 20:02 wemeetagain

  • related https://github.com/ChainSafe/lodestar/issues/6268

wemeetagain avatar Feb 05 '24 20:02 wemeetagain

~~it's not possible to get rid of shuffling inside CachedBeaconState, it's needed in at least 2 places~~:

  • upgradeStateToAltair()
  • processSyncCommitteeUpdates()

~~this is required in the processSlots() where it spans through multiple epoch transitions~~

update: we don't need epoch shuffling for the above apis as long as we cache previous/current/next active validator indices

  • upgradeStateToAltair: can rebuild the committees/shuffling based on active validator indices and cached randao mixes inside the state
  • processSyncCommitteeUpdates: it only needs next epoch validator indices which is available after beforeProcessEpoch
  • but for processAttestations we need it https://github.com/ChainSafe/lodestar/blob/3076b4cbe34bbe4ebadb005f78f90c256ab31aae/packages/state-transition/src/block/processAttestationsAltair.ts#L52

twoeths avatar Apr 03 '24 07:04 twoeths