lodestar
lodestar copied to clipboard
Holesky Retro Features: Introduce mechanisms to force Lodestar to be on a minority chain
Problem description
During the recent event of Holesky, we spent a lot of time to force Lodestar to follow the correct fork. Would be great if we can add flags/apis such that fork choice can be intervened.
#7500 allows certain blocks to be blacklisted and thus never to be stored in fork choice. However it would be great if there are more tools on hand to allow us to influence more over fc decision making in future events.
Solution description
Some ideas:
- Whitelisting blocks: Force LS to follow any fork that contain the whitelisted blocks
- A flag to disable optimistic sync
- An idea raised on discord link that introduces a post endpoint to allow LS to temporarily track and reorg to a peer specified by enr for limited time.
Additional context
No response
Sync is not exactly fork-choice related but it is tangential. There are some discussions in Eth R&D about "checkpoint syncing" from un-finalized states. Several other clients are implementing that to help with restarts on mission save-holesky. There will end up being very long periods of non-finality and it will be nice to be able to speed up sync to help with block production and chain liveness. It would be ideal to save some number of states "on the canonical chain" to allow for rapid sync. There are several security concerns about this feature for long-term usage but for holesky it seems like a good idea.
Discussion surrounds passing a root to the cli on startup. Many have talked about that but if the node dies unexpectedly it will auto restart at whatever point was passed in the command. It might be a good idea to persist the most recent state root after fork-choice update (if there is only one tip of the chain) to a file on the host. Then update that file on each fork-choice update (assuming only a single tip). That way on startup the node will read the root from the file and then use that to pull a state out of the db as the checkpoint.
we already have chain.nHistoricalStatesFileDataStore feature flag (false by default) to store checkpoint states on disc instead of db, by default all checkpoint states are stored at ~/beacon/checkpoint_states
file name is checkpoint serialized hex like 0x00c50100000000001a97f8c5ae8e48ba53af214e621998bb7fc46c091c820666a0d31902b67ad3ca which is basically ${epoch + root_hex}
some advantages using that flag:
- choose specific checkpoint state when restart
- share checkpoint state to other beacon nodes
- prune checkpoint states manually. Right now it takes 56GB per day which is not sustainable
I think we want both, be able to sync from an external unfinalized state either via --checkpointState or --checkpointSyncUrl and be able to use a local checkpoint state from db or file.
If we store checkpoint states as files instead of in db, we kinda already have --checkpointState flag to point it to one of those files.
Right now it takes 56GB per day which is not sustainable
I wonder if we could somehow prune persisted checkpoint states from really old epoch, not sure if this is safe to do as you might still need the state although in practice seems really unlikely if it's 10+ epoch old. Any other reasons why we need to persist all until last finalized?
share checkpoint state to other beacon nodes
I like the idea of storing those as files, what is the downside of doing that, I would assume slower I/O and potentially higher storage requirements?
~~Based on CL hardening at interop, we may want to consider being able to checkpoint sync from unfinalized that is not at an epoch boundary~~. This was discussed but would be too difficult to implement without the previous state.