hydra
hydra copied to clipboard
Backup & restore the state of a Hydra Head
What & Why
Currently, there is no mechanism to recover the current state of a Hydra Head when restarting the hydra-node
(e.g. following a crash). As a consequence, the Hydra Head can't continue processing transactions and, even worse, trust is required between the participants to close the Head in a non-adversarial manner.
Persisting the state of the Hydra Head on disk and restoring it on restart will allow hydra-node
to resume operations and recover from unexpected downtime.
Out of scope: Longer down times (depending on the contestation period, a Hydra protocol parameter) are not covered!
Requirements
- The
hydra-node
can be restarted without losing it's knowledge of Hydra Head(s) - An open Hydra Head can always be closed after restart
- An initiated Hydra Head can always be aborted after restart
- A restarted Hydra node (ideally) can progress in L2 transaction processing
- It's acceptable that a Hydra Head might still not progress, e.g because of missed network events (related #188)
To be discussed
- Technical detail: Shall we store the accumulated head state or all incoming events before they are processed?
- Storage format: backward-compatibility, introspect-ability ("white-box" & audit a running Hydra Head?)
Tasks
- [ ] #554
- [x] #257
- [ ] #541
Added some bullets for some subtasks of this.
One thing to consider: When we restore from persistence, we would need to know at what ChainPoint
we have had been before and resynchronize with the node from that point onward. Otherwise, we might "miss" chain events?
Also, rollbacks:
Can it be that we have been temporarily on a fork, stop the node and restore it wanting to synchronize from the same ChainPoint
. The cardano-node, in the meantime, has been rolled-back off the fork. When we re-connect to it.. would we see a rollback or no intersection? How to handle this? With/without storing the chain state?
Furthermore, if we have no intersection, we would need to know alternative past ChainPoint
s where we would want to synchronize from, i.e. this info needs to be in the persisted data. Right now, that info would be in the linked list of ChainStateAt
.
Shall we figure out how to replay some events on restart to api customer?