hydra icon indicating copy to clipboard operation
hydra copied to clipboard

Backup & restore the state of a Hydra Head

Open ch1bo opened this issue 3 years ago • 3 comments

What & Why

Currently, there is no mechanism to recover the current state of a Hydra Head when restarting the hydra-node (e.g. following a crash). As a consequence, the Hydra Head can't continue processing transactions and, even worse, trust is required between the participants to close the Head in a non-adversarial manner.

Persisting the state of the Hydra Head on disk and restoring it on restart will allow hydra-node to resume operations and recover from unexpected downtime.

Out of scope: Longer down times (depending on the contestation period, a Hydra protocol parameter) are not covered!

Requirements

  • The hydra-node can be restarted without losing it's knowledge of Hydra Head(s)
  • An open Hydra Head can always be closed after restart
  • An initiated Hydra Head can always be aborted after restart
  • A restarted Hydra node (ideally) can progress in L2 transaction processing
  • It's acceptable that a Hydra Head might still not progress, e.g because of missed network events (related #188)

To be discussed

  • Technical detail: Shall we store the accumulated head state or all incoming events before they are processed?
  • Storage format: backward-compatibility, introspect-ability ("white-box" & audit a running Hydra Head?)

Tasks

  • [ ] #554
  • [x] #257
  • [ ] #541

ch1bo avatar Jan 30 '22 17:01 ch1bo

Added some bullets for some subtasks of this.

ch1bo avatar Apr 19 '22 14:04 ch1bo

One thing to consider: When we restore from persistence, we would need to know at what ChainPoint we have had been before and resynchronize with the node from that point onward. Otherwise, we might "miss" chain events?

ch1bo avatar Sep 09 '22 07:09 ch1bo

Also, rollbacks:

Can it be that we have been temporarily on a fork, stop the node and restore it wanting to synchronize from the same ChainPoint. The cardano-node, in the meantime, has been rolled-back off the fork. When we re-connect to it.. would we see a rollback or no intersection? How to handle this? With/without storing the chain state?

Furthermore, if we have no intersection, we would need to know alternative past ChainPoints where we would want to synchronize from, i.e. this info needs to be in the persisted data. Right now, that info would be in the linked list of ChainStateAt.

ch1bo avatar Sep 09 '22 07:09 ch1bo

Shall we figure out how to replay some events on restart to api customer?

pgrange avatar Oct 24 '22 10:10 pgrange