teku
teku copied to clipboard
Automatically recover from late upgrade for a fork
Description
When a user fails to upgrade prior to a fork, they may import blocks after the fork epoch that are from the old milestone. After Teku is updated, it then fails to start because of those blocks. Teku should delete the invalid blocks and resume syncing from either the latest finalized state or the last valid block.
Worth noting that in prune mode we won't be able to recover if the invalid blocks go past the finalized checkpoint because we don't have a state for that. We probably shouldn't allow reverting a finalized block even in archive mode so I think this will boil down to detecting that blocks in the database are invalid and effectively deleting the hot database so that we start fresh from the finalized block. It won't be that simple but that's the effect we'll want.
We now have a delete-hot-blocks
debug db subcommand which will recover. We would need to add some proper testing around it and then we could potentially look to run it automatically if we fail to load hot blocks at startup because of invalid SSZ. It maybe worth only "fixing" the db in this case if an extra option is specified (e.g --data-storage-recovery-enabled
) for extra safety.
Experience with Bellatrix suggests that tool is enough - more and more clients are moving to a model that doesn't automatically recover anyway. Closing this.