teku icon indicating copy to clipboard operation
teku copied to clipboard

Automatically recover from late upgrade for a fork

Open ajsutton opened this issue 3 years ago • 2 comments

Description

When a user fails to upgrade prior to a fork, they may import blocks after the fork epoch that are from the old milestone. After Teku is updated, it then fails to start because of those blocks. Teku should delete the invalid blocks and resume syncing from either the latest finalized state or the last valid block.

ajsutton avatar Nov 02 '21 09:11 ajsutton

Worth noting that in prune mode we won't be able to recover if the invalid blocks go past the finalized checkpoint because we don't have a state for that. We probably shouldn't allow reverting a finalized block even in archive mode so I think this will boil down to detecting that blocks in the database are invalid and effectively deleting the hot database so that we start fresh from the finalized block. It won't be that simple but that's the effect we'll want.

ajsutton avatar Jun 09 '22 06:06 ajsutton

We now have a delete-hot-blocks debug db subcommand which will recover. We would need to add some proper testing around it and then we could potentially look to run it automatically if we fail to load hot blocks at startup because of invalid SSZ. It maybe worth only "fixing" the db in this case if an extra option is specified (e.g --data-storage-recovery-enabled) for extra safety.

ajsutton avatar Aug 09 '22 09:08 ajsutton

Experience with Bellatrix suggests that tool is enough - more and more clients are moving to a model that doesn't automatically recover anyway. Closing this.

ajsutton avatar Dec 02 '22 00:12 ajsutton