penumbra icon indicating copy to clipboard operation
penumbra copied to clipboard

Release tracking issue: structure crypto primitives to support multiple field arithmetic implementations

Open hdevalence opened this issue 2 years ago • 8 comments

Is your feature request related to a problem? Please describe.

Our existing crypto primitives are all built on top of field arithmetic implementations pulled from Arkworks, originally for convenience, since we could pull them off the shelf and start using them. However, we're now experiencing pain points exposing the limitations of this approach:

  • In the hardware wallet context, we can't easily run our crypto code in a no_std+no_alloc context, since the Arkworks traits require allocation.
  • In the WASM context, we have terrible performance because Arkworks implements big-integer field arithmetic using 64-bit words and a 128-bit multiplier, but because WASM does not provide such a multiplier, each "base" operation must itself be emulated, causing a huge slowdown.

To deal with both of these problems, we need to take ownership of deeper parts of our cryptography primitives, and ensure our implementations are tuned to our needs. In particular:

  • We need to be able to use a minimal subset of our cryptography (everything needed to compute an EffectHash and a SpendAuth signature) in a no_std+no_alloc context.
  • We need to be able to select between 32- and 64-bit field arithmetic implementations. We want to use 64-bit arithmetic by default, but be able to select 32-bit arithmetic for hardware wallets (no proving, no Arkworks, no_std, no_alloc) and for WASM (proving, with Arkworks)

Describe the solution you'd like

Here's a proposed approach that I believe could address both of these issues.

  • [x] penumbra-zone/decaf377#60

    We'll need one implementation for each combination of field and bitsize:

    • [x] Fp32: proving curve base field, 32-bit
    • [x] Fq32: proving curve scalar field / decaf377 base field, 32-bit
    • [x] Fr32: decaf377 scalar field, 32-bit
    • [x] Fp64: proving curve base field, 64-bit
    • [x] Fq64: proving curve scalar field / decaf377 base field, 64-bit
    • [x] Fr64: decaf377 scalar field, 64-bit

    The field arithmetic should not have external dependencies, should not use the Arkworks traits, and should not allocate. This is a lot of field arithmetic implementations. Instead of writing them by hand, let's use fiat-crypto's Rust backend to generate formally verified field arithmetic for each of our field / wordsize combinations. Ideally, we won't have to edit the generated code much or at all, but we'll need to do some discovery there.

    Each of these field arithmetic implementations should be in separate submodules (e.g., crate::fields::{u32::{fp::{Fp32, ...}, ...}, ...}, using the facade pattern for re-exports.

  • [x] https://github.com/penumbra-zone/decaf377/issues/65

    The initial wrapper types don't have inversion implemented. We'll need that.

  • [x] penumbra-zone/decaf377#61

    Using a similar strategy as I designed for curve25519-dalek, provide a way to select a backend at compile-time:

    • [x] Add u32_backend, u64_backend features in the crate
    • [x] Set the u64_backend as a default feature
    • [x] Add #[cfg]-gated type aliases, so that
      • crate::fields::{Fp, Fq, Fr} = crate::fields::{Fp32, Fq32, Fr32} when the u32_backend is enabled;
      • crate::fields::{Fp, Fq, Fr} = crate::fields::{Fp64, Fq64, Fr64} when the u64_backend is enabled;
      • compilation fails with a helpful error if either (a) no backend is selected or (b) multiple backends are selected.

    Note that this approach is slightly "illicit", in that crate features are supposed to be purely additive. This will mean we'll need to propagate annoying feature selection logic through our dependency tree (see the way that curve25519-dalek and x25519-dalek or bulletproofs backend features interact). However, the really big upside, the upside that makes it all worthwhile, is that unlike a generics-based approach, the backend selection can be transparent to depending code (though not to depending Cargo.toml configurations).

  • [x] penumbra-zone/decaf377#62

    Next, restore compatibility between our new field arithmetic implementations and the Arkworks stack. This will involve writing implementations of the Arkworks field arithmetic traits that wrap whatever interface the fiat-crypto codegen creates. The trait implementations should be in separate files, so we can feature-gate the implementation under an arkworks feature in a single spot (the submodule declaration). Since trait implementations are global, there's no re-exporting required, trait impls can be tucked away in a submodule.

    We can do a single trait impl first, and then use copy-paste and search-and-replace technology to replicate it across the other fields. Assuming that the fiat-crypto codegen produces the same interface, independent of limb size, we could potentially implement the compat traits on the Fp, Fq, Fr type aliases, rather than on the backend types.

  • [x] https://github.com/penumbra-zone/decaf377/issues/69

  • [x] https://github.com/penumbra-zone/decaf377/issues/70

  • [x] penumbra-zone/decaf377#63

    We need to provide a minimal subset of decaf377 operations in a no_std+no_alloc context. To do this, we'll need to have a basic elliptic curve implementation that doesn't depend on ark-ec or any other part of Arkworks. Minimal means all the operations that would be required to produce signatures and effect hashes:

    • [x] Decode + Encode
    • [x] Map-to-Group
    • [x] Add, Sub
    • [x] Constant-time scalar multiplication (no lookup tables or anything)

    Other functionality can be provided by feature-gated wrappers around calls to methods in ark-ec (see above). If these need to translate between point representations, the extra memcpy's won't be a big deal. For instance, signature verification uses a double-base variable-time scalar multiplication, but this is much more expensive than converting point representations (as long as we're not doing an inversion or something).

  • [x] #3673

    The decaf377-rdsa crate should have matching feature-gating to allow selection of u32 or u64 backends, and to avoid Arkworks as a required dependency. We only care about producing signatures in the minimal no_std+no_alloc context, so it's fine if the verification methods become feature-gated.

  • [x] #3674

    Key agreement is also used for effect hash computations, so we'll need to use it in the minimal context and have corresponding Cargo features.

  • [ ] #3675

    We need to use the Poseidon code in the minimal context, too. We can have that crate depend on decaf377 for the field arithmetic implementations and have backend selection features. We'll need to be able to avoid an Arkworks dependency for computing hash values, and only rely on it when making proofs.

  • [ ] #3676

    It should be possible to re-instantiate all of the circuit implementations with the backend-selectable Fp, Fq, Fr implementations in the decaf377 crate. This allows unlocking WASM performance by avoiding the 128=>64-bit translation overhead.

hdevalence avatar Dec 17 '23 08:12 hdevalence

I think this subsumes penumbra-zone/penumbra#3512.

hdevalence avatar Dec 17 '23 08:12 hdevalence

fiat-crypto has a web interface, neat: https://mit-plv.github.io/fiat-crypto/?argv=%5B%22word-by-word-montgomery%22%2C%22--lang%22%2C%22Rust%22%2C%22fq%22%2C%2232%22%2C%228444461749428370424248824938781546531375899335154063827935233455917409239041%22%5D&interactive

hdevalence avatar Dec 17 '23 08:12 hdevalence

Some output for Fq: https://gist.github.com/hdevalence/6556634efe15f660919561d2d55b8443

hdevalence avatar Dec 17 '23 08:12 hdevalence

Some output for Fr: https://gist.github.com/hdevalence/3b3c9f9574565b1eaf78cedd888081c7

hdevalence avatar Dec 17 '23 17:12 hdevalence

@hdevalence It's worth pointing out that 32-bit limbs in the WASM context isn't always the ideal limb size depending on the curve construction used. For instance, reference https://github.com/mitschabaude/montgomery and https://hackmd.io/HKHbMzxIQmOQUFNl7CF5sg where 13 x 30 bit montgomery multiplication is optimal. In the GPU environment, since the max unsigned integer type is 32-bits, the max limb-size supported is 16-bits wide. There we see the optimal limb size to be 13-bit limbs.

TalDerei avatar Dec 19 '23 20:12 TalDerei

For the contexts we care about -- the Cortex-M3 on a Ledger, and wasm in the browser -- we do have a 64-bit multiplier, so we can use 32-bit limbs there. I suspect that code sharing with a GPU will be more difficult, so I'd like this implementation to focus on the 32- and 64-bit cases.

hdevalence avatar Dec 24 '23 22:12 hdevalence

I did the first part -- generating code with fiat-crypto -- in https://github.com/penumbra-zone/decaf377/pull/64

I think a good next step would be to work on filling in arkworks compatibility, making it so that the new Fp, Fq, Fr types implement the arkworks PrimeField trait when the arkworks feature is enabled. I created arkworks.rs files for those impls to live in but didn't populate them.

hdevalence avatar Dec 25 '23 00:12 hdevalence

Great! I'll pick up from there tonight

TalDerei avatar Dec 25 '23 00:12 TalDerei