penumbra icon indicating copy to clipboard operation
penumbra copied to clipboard

Support separate custody (including multisig) for validator definition and validator voting

Open plaidfinch opened this issue 1 year ago • 1 comments

Is your feature request related to a problem? Please describe.

The signing authority which controls the identity of a Penumbra validator is exactly a Penumbra spendkey. In the simplest custody setup (a single software wallet), whichever wallet is used to broadcast a validator definition is also the same spend authority used to authorize that update to the validator.

The protocol does not, however, require this conflation; indeed, the design of the validator system for Penumbra is deliberately made so that a validator operator can practice rigorous separation of concerns: a validator's non-custodial funding streams need not remit funds to addresses controlled by the key which controls the validator definition.

One possibly desirable validator configuration would be separated custody for the validator's funding (from reward streams, and optionally self-delegation) and the validator's identity. Additionally, it should be possible to custody the validator's identity just as securely (or not) as any other Penumbra spend key, with the choice of custody backend (online software wallet, airgapped software wallet, threshold multisig, hardware wallet) informed by the operator's requirements, not by the limitations of client software.

The above affordances are all permitted by the protocol but not the current client tooling. This issue describes a sequence of steps to resolve all of these problems and permit validator identity custody to be as first-class as any other type of custody in the system.

Describe the solution you'd like

In order to close this gap, I propose the following augmentations to the client tooling. Note that none of them require changing any protobuf or RPC definitions in a way that is backwards-incompatible.

  1. The first step is to permit separated signing and broadcasting of validator-related actions:
  • [x] The command pcli validator definition upload should take an additional argument --signature which should be attached to the validator definition, in place of the default locally-generated signature using this pcli's own custody backend.
  • [x] A new command pcli validator definition sign should take the same arguments as upload (excepting the new addition above), but instead use the local custody backend to produce a signature for that validator definition and output it to the command line.
  • [x] The command pcli validator vote should be turned into a subcommand pcli validator vote cast, to make room in the hierarchy of commands for other vote actions.
  • [x] The command pcli validator vote cast should take an additional argument --signature which should be attached to the validator's vote, in place of the default locally-generated signature using this pcli's own custody backend.
  • [x] A new command pcli validator vote sign should take the same arguments as cast (excepting the new addition above), but instead use the local custody backend to produce a signature for that vote and output it to the command line.

With the above implemented, validator custody can now function across an airgap.

  1. Now it's time to resolve the longstanding issue in the client tooling that separate governance keys are supported by the protocol but not the client tooling:
  • [x] The command pcli init should take a new subcommand governance-key which takes the same arguments as the rest of pcli init, but initializes a separated governance subkey and stores it in the pcli configuration alongside the other root key. If there is already a governance key in the configuration, this command should fail. The custody method for the governance key should not need to match the custody method for the spend key.
  • [x] The commands pcli validator vote cast and pcli validator vote sign should use the configured governance key, if one is present.
  • [x] The command pcli validator definition template should use the configured governance key, if one is present.

Now both the validator identity and the separate governance key can be custodied separately from the spend authority used to broadcast the transactions, and independently from one other.

  1. Finally, the above setup does not yet permit the validator identity to sign definitions or votes, because the current threshold custody backend does not permit signing anything other than transaction plans. To resolve this, the following:
  • [ ] The protobuf message penumbra.custody.threshold.v1.CoordinatorRound1 should be augmented to be a "go style enum" by adding the additional fields validator_definition and validator_vote. Its corresponding domain type should be a Rust enum which expects precisely one of those fields to be present.
  • [ ] The custody RPC penumbra.custody.v1.CustodyService should be augmented with two new RPC methods: AuthorizeValidatorDefinition and AuthorizeValidatorVote, each of which take as input their respective things-to-authorize, as well as the repeated PreAuthorizations, as AuthorizeRequest does, and each of which produce as output a message containing a single crypto.decaf377_rdsa.v1.SpendAuthSignature.
  • [ ] The SoftKMS custody implementation should be updated to implement these new methods.
  • [ ] In pcli, the place where previously the SoftKMS spendkey was directly used to produce signatures for these things, instead the custody backend should be requested to produce a signature.
  • [ ] The threshold custody backend implementation should be enhanced to implement these new methods. This is largely type-directed refactoring starting in the place where the domain type CoordinatorRound1 is made to contain an enum of possible things-to-sign.

Describe alternatives you've considered

The alternative is to not implement this work, or to partially implement it. The above chunks (1-3) are described in dependency order, and I believe also in priority order.

  1. Not doing this prevents any validator from placing their validator identity key — the one thing whose compromise cannot be recovered from, and which cannot be rotated — behind even an airgap, let alone hardware or threshold custody. I think this is important to fix.
  2. Not doing this prevents validators from separating governance keys from their identity key, which means that a validator must choose between higher availability of the governance key vs. higher security of the identity key. I think this is somewhat important to fix, but validators could likely make do.
  3. Not doing this prevents validators from keeping their identity and/or governance keys in threshold custody, which is substantially less cumbersome, error-prone, and vulnerable to catastrophic loss than is an airgap setup. Validators could make do without threshold custody, but it would be highly desirable for them to be able to use it to safeguard their identity key just as effectively and flexibly as they can safeguard their funds.

Additional context

I think this is probably 2–4 days of work: 1–2 days to implement parts (1-2) to permit remote authorization flow, and another 1–2 days to make threshold signing work for validator definitions and validator votes.

plaidfinch avatar Feb 13 '24 19:02 plaidfinch

On the "go style enum" point, can we change the message to a proto oneof to have proper extensibility in the future? This is sometimes and sometimes not possible without breaking compatibility. The Buf docs have more details. It would be good to use a oneof if possible.

hdevalence avatar Mar 11 '24 02:03 hdevalence

On the "go style enum" point, can we change the message to a proto oneof to have proper extensibility in the future? This is sometimes and sometimes not possible without breaking compatibility. The Buf docs have more details. It would be good to use a oneof if possible.

Looking here, it looks like yes we can:

  1. Removing a field from a oneof is a backward incompatible
  2. Moving existing fields into a oneof is a backward incompatible change, unless it is a single field being moved into a new oneof.

This change is (it seems overzealously?) flagged as backwards-incompatible by the buf lint.

plaidfinch avatar Mar 23 '24 00:03 plaidfinch

This is completed by #4037, #3986, and #3985, collectively.

plaidfinch avatar Mar 31 '24 01:03 plaidfinch