oasis-node: Independent consensus light history and pruning
Context
It has become impractical to keep all state from the genesis thus more and more nodes are pruning it.
Problem
- Pruning is already causing availability issues:
- E.g. https://github.com/oasisprotocol/internal-ops/issues/1281.
- Previous issues with emty validator set because of the missing light blocks.
- Consensus light blocks are computed ad-hoc, from the full state:
- Running observer with historical state requires all consensus data, just to fetch light blocks - see.
Solution:
- Create additional DB for consensus light blocks and query them from there.
- Independent full-state (NodeDB && blockstore) vs light state pruning.
- E.g. keep 1 year of light history by default (should be few
GBmax).
- E.g. keep 1 year of light history by default (should be few
Follow-ups:
- Similar thing should be done for the runtime state and its pruning (https://github.com/oasisprotocol/oasis-core/issues/6400).
- History expiry.
Proposed implementation (consensus)
Keep (abusing) ad-hoc production of light blocks and only write light headers to the new light history before pruning corresponding height.
Pros:
- Simpler as we don't have to implement consensus light history (re)-indexer.
- Easier and safer to synchronize data at the tail of the chain than at the tip.
Cons:
- Internally, to serve light header, two databases need to be queried.
- Having all light headers pre-computed in the separate DB improves the latency for this call and reduces the load on consensus DBs.
Follow-up:
By default every node fetches consensus light blocks in reverse until certain target (independent process, not blocking initialization).
This way we can guarantee solid data availability of the whole network.
Similar could be done for the runtime light history (https://github.com/oasisprotocol/oasis-core/issues/6305).
Additional consideration:
Light history size should be negligible compared to other consensus DBs, therefore we may not prune it at all. Possibly, we could fetch it in reverse up-to genesis, although I find 1 year a sane default.