linera-protocol icon indicating copy to clipboard operation
linera-protocol copied to clipboard

Fallback chain owners

Open afck opened this issue 1 year ago • 0 comments

In addition to super owners and regular owners, chains should have have the current validators as fallback owners that can only become active if there is a sufficiently old protected/bouncing message in the inbox.

This can be used e.g. to migrate lagging chains to new epochs, even if their regular owners are offline. And more generally, it will guarantee that protected messages are eventually received.

Design

There will be a configurable delay D for each chain.

If any unskippable (i.e. protected or tracked) message has been in the inbox (subjectively) for D or longer, an honest validator will sign a timeout certificate, and try requesting one from all the other validators as well; if there aren't enough signatures, it updates the others about all unskippable messages in the inbox.

So at most D after the first honest validators does that, some honest validator will succeed collecting a quorum. It sends the certificate to all validators and the chain switches to fallback mode.

Tasks

  • [x] Add a fallback_delay: Duration to ChainOwnership. (Recommended to be either zero, for public chains, or several days; short delays on chains with super owners risk breaking liveness. It should be shorter than half the lifetime of a committee, though.)
  • [x] #1842
  • [ ] Extend the node service:
    • [ ] Allow adding/removing chains and key pairs to/from the wallet of a running service.
    • [ ] Add an updateValidators mutation that updates all validators about the specified chain and about the messages in that chain's inbox. A kind: [Tracked,Protected] setting makes it ignore skippable messages.
    • [ ] Add a requestFallback mutation that requests a timeout certificate from all validators.
  • [ ] Add a timeout to the workers: Whenever a tracked or protected message in any chain has been in the inbox (subjectively) for fallback_delay, the worker causes the chain to be added to a node service (or starts one if necessary; probably on another machine), and makes the updateValidators and requestFallback mutations.
  • [ ] Enforce a system-wide maximum for the fallback delay, so all chains are actually forced to handle protected and tracked messages.
  • [ ] When handling a CertifiedBlock certificate from a public/fallback round, a worker checks whether there are still old messages waiting and/or the chain ownership changed. If the chain should go back to non-public/fallback mode, it removes the chain from the node service.

Public chains can be configured to have a fallback_delay of zero. In that case, the validators run the service with the default message policy i.e. the do accept the incoming messages. Such chains should also start the round timeouts when skippable messages arrive, and should not allow empty blocks.

Possible future work:

  • [ ] Allow validators to specify different keys for signing block proposals. These would need to go into each committee, and the genesis config.
  • [ ] Additional fees (paid by the chain balance?) for:
    • [ ] Starting a public chain, changing a chain's ownership to be public, or a chain switching to fallback mode.
    • [ ] Each block in public/fallback mode.

afck avatar Feb 05 '24 09:02 afck