nimbus-eth2
nimbus-eth2 copied to clipboard
Prune-to-era
Currently, era files are constructed by running a separate process on a daily basis, opening the sqlite database and writing any missing era files to disk.
The main process on the other hand recently gained support for pruning where it simply drops the blocks - a node that has pruning enabled can no longer recreate era files for the pruned period.
Instead of deleting on prune, the pruned data could be written to era files, thus creating an immutable archive that ensures that no data is lost - although pruning currently keeps 5 months, if the era export doesn't run for that period, it's possible that data will be lost.
Pruning to era does not come without its own quirks and difficulties:
- Era files can be written incrementally meaning that blocks can be added one by one as they are pruned - however, the last part of the era file (state and index) would require additional work leading to "uneven" processing times
- The era writer would need to deal with partial data loss, for example during a power outage - a safe era writer would thus have to notify the database of a "safe pruning point" so that no data is deleted until the era file has been sealed.
- Because of the above, prune-to-era actually means "write era file" then "prune data that already exists in era", with some lag.
- Currently, pruning works block by block so as to make the process of pruning "smooth" in terms of processing time (instead of bulk-deleting things) - era file writing should follow a similar strategy, ie once we hit the 5-month prune point, we'd add a "lag" so that era files are written - then another lag so that the "tail" pointer jump that happens on state removal is smoothed
All in all, in order to safely export era files today, the era writer needs to run at least once every 5 months, else data is lost - this is not too bad, but if the era write instead is run every day, data in era and database are duplicated for those 5 months, leading to wasted space.
Come to think of it, the function that detects sealed era files and removes the corresponding blocks from the database could be written already - this would "more or less" implement this feature, assuming the separate era exporter is run on a regular basis