Automatic history pruning
Resolves #7943
Changes
- Drop history outside of configurable window on fork choice
Types of changes
What types of changes does your code introduce?
- [ ] Bugfix (a non-breaking change that fixes an issue)
- [x] New feature (a non-breaking change that adds functionality)
- [ ] Breaking change (a change that causes existing functionality not to work as expected)
- [ ] Optimization
- [ ] Refactoring
- [ ] Documentation update
- [ ] Build-related changes
- [ ] Other: Description
Testing
Requires testing
- [x] Yes
- [ ] No
If yes, did you write tests?
- [x] Yes
- [ ] No
Documentation
Requires documentation update
- [x] Yes
- [ ] No
Add section on configuring History settings. Users can configure DropPreMerge and HistoryPruneEpochs as they chose.
Requires explanation in Release Notes
- [x] Yes
- [ ] No
Added support for automated removal of block history.
Should we consider the state of block processing and synchronization? Should we prune only when are free from processing/production? Should we wait for blocks/receipts or even state sync before pruning? Should we consider the settings in block/receipt sync?
Good questions, I would appreciate more input from people who are more familiar with the sync process than I am. My thinking behind not checking if we are syncing:
- We only prune blocks older than 82125 epochs, outside the weak subjectivity period so it shouldn't affect anything
- If for some reason it does, then we may want to prune while syncing. If we didn't do this the size of the history DB could get very large during sync, before eventually being pruned. This loses the benefits of history expiry, as we will still need disk space to store history while syncing
You also mention block production, do you think we shouldn't prune while producing a block? Atm I can't see a problem with doing that
We only prune blocks older than 82125 epochs, outside the weak subjectivity period so it shouldn't affect anything
We sync old blocks, so I suppose such pruning will prune lots of them
- If for some reason it does, then we may want to prune while syncing. If we didn't do this the size of the history DB could get very large during sync, before eventually being pruned. This loses the benefits of history expiry, as we will still need disk space to store history while syncing
Yes, so what if we change sync to consider pruning border and do not sync old blocks at all? Then after syncing we turn on pruning and prune what became outdated(maybe a dozen of blocks or so)
You also mention block production, do you think we shouldn't prune while producing a block? Atm I can't see a problem with doing that
Pruning is a secondary task that requires resources, seems like a good candidate for execution when ProcessingQueueEmpty. Purely optional for this request
there should definitely be changes to the syncer. why sync blocks that i will be pruning away later?! just stop syncing bodies and receipts when reaching the boundry
For the sync part of the code, the easiest way to integrate it is with some kind of interface IBlockPersistenceStrategy with a ShouldPersistBlock(BlockInfo). BodiesSyncFeed and ReceiptsSyncFeed has a SyncStatusList with a TryGetInfosForBatch method that accept a function that you can put it to specify the logic if it should download the block or not. You can also update the SyncConfigBarrierCalc to have the progress log make more sense. Because the head, suggestedheader, bestnumbers are all confusing and painful to think about, I suggest just use _blockTree.SyncPivot as the assumed head. It make it much easier to reason about.
Actually, peers may not serve block if not done accurately. So the assumed head probably should be BestSuggestedHeader
Need to trigger Eth69ProtocolHandler.NotifyOfNewRange when earliest available block changes. Either as part of this or a follow-up PR.