Adjust pruning to finalization
Is your feature request related to a problem? Please describe
We need to make sure that pruning is working and the node is fully functional and is resistant to re-orgs. Currently, it's possible that if we don't reach justification during a number of blocks that is set via -pruning= parameter, pruning feature will start deleting blocks which are essential for re-orgs.
Describe the solution you'd like One solution is to never prune blocks which are after the latest justification. This is reletively simple approach and will guarantee that the node will always be able to re-org. The downside is that node can't guarantee a fix disk usage. Currently, the default value for pruning is 550 blocks or 11 epochs.
Describe alternatives you've considered Alternatively, during the re-org the node can start downloading missing blocks, validating them and then throwing away to keep the disk usage fixed. The downside of this approach is extra bandwidth that can be used to download missing blocks.
Additional context Add any other context about the feature request here.
What is important is not only that we do not throw away blocks from after the finalization point, but also that we do not throw away blocks from before the finalization point which contained UTXOs at the point of finalization and whose spent/unspent status is subject to change in the non-finalized period after and during reorgs. That's why finalization should only be triggered at the finalization which happens after that and must not happen in between as it might currently (as it is triggered by time).
Here is the example from a previous slack conversation which outlined the issue in greater detail:
Consider these chains:
... -> A -> ... -> X at time t=0
X is a finalization point. At time t=0 A contains a UTXO.
... -> A -> ... -> X -> ... -> M at time s > t
M references the UTXO from A as stake.
A block N arrives at u > s. Say it should trigger a reorg to this chain:
... -> A -> ... -> X -> ... -> N such that M is no longer part of it (N had more stake).
The proposer that proposed M proposes a block P which refers to the same UTXO from A.
Now at t=0 I have no problem validating M as the UTXO from A is not spent and I have always access to the UTXO set.
When N arrives I can validate it without problems too. The UTXO from A is spent at that point in time until I organized the reorg (it will be unspent again afterwards because M is no longer part of the active chain).
When the new block P arrives I might have a problem now. I try to validate it before I trigger the reorg, but I can't access that UTXO (it was spent in between and might have been thrown away by pruning).
Am I correct?
I am reasoning that this might be the case because the eviction might be triggered sometime between s and u.
That's why I think eviction should only be triggered at the point we finalize.
I am stumbling across this now that I'm working on the kernel/stake validation. That's how I figureded that particl does not support pruning. They need to be able to always access every transaction, even if it was spent, because it might become unspent in the case of a reorg.
We only reorg non-finalized parts of the chain, nevertheless it might affect the spent/unspent status of a UTXO from before the finalization point, as outlined above.
What is important is not only that we do not throw away blocks from after the finalization point, but also that we do not throw away blocks from before the finalization point which contained UTXOs at the point of finalization and whose spent/unspent status is subject to change in the non-finalized period after and during reorgs.
I think in this case it will be possible to disable pruning altogether by creating UTXO at each block and not spending them.
Let's focus only on the pruning mode in this issue, and all other possible improvements move to a separate issue.
So, when we run in pruning, we should allow deleting only these blocks which are up to the latest finalized checkpoint but not deleting the checkpoint itself. The reason for keeping the block of the last finalized checkpoint is because we need to download the parent block of the snapshot after the fast sync. And the snapshot always points to the block which is one block before the checkpoint.