orion icon indicating copy to clipboard operation
orion copied to clipboard

convert newtypes to const generics

Open vlmutolo opened this issue 2 years ago • 14 comments

This is still a WIP. The basic method was to define an inner type that behaves the way we want it to, e.g. with special Drop impls, etc. Then for each newtype we want to create, we create a small wrapper over the inner type, like struct Digest(Blake2bArray), and implement two small traits, essentially From and AsRef, to convert to that inner type.

In another module, we provide more complex blanket implementations for any types that implement From and AsRef for our inner type. This allow us to, for example, auto-implement len, from_slice, etc for the newtypes without exposing the inner type (which could be hazardous for inner types like the planned PrivateArray type.

vlmutolo avatar Feb 03 '22 00:02 vlmutolo

So there are lots of problems with this branch as-is.

  • It's missing lots of documentation.
  • The location/name of the base.rs module is probably not ideal.
  • There's probably a better way to do these trait gymnastics than the OrionDeref trait I defined.

So this is more of a proof-of-concept that we might be able to use traits and const generics to achieve both implementation deduplication and the safety of newtypes.

@brycx Let me know what you think. I'm probably going to try this same thing on some kind of secret data type tomorrow night to see if the same strategy would work there. The inner type will have some more complex logic, but I think it should still work.

vlmutolo avatar Feb 03 '22 01:02 vlmutolo

So I tried another iteration, this time splitting up PublicData and SecretData. It's simpler that way, and more easily allows us to define separate methods on them without worrying about trait bounds and Public/Secret marker traits (those marker traits were removed).

Basically, the new design is based around two structs:

struct PublicData<B, C> {
    bytes: B,
    context: C,
}

struct SecretData<B, C> {
    bytes: B,
    context: C,
}

The B type parameter allows us to use different byte storage objects to hold the underlying data, such as arrays, Vecs, and maybe even eventually bytes::Bytes. Both PublicData and SecretData share these underlying storage mechanisms.

The C type parameter stands for "context", and allows us to express the purpose of the data store. Practically, it ensures we never accidentally misuse a data store meant for a different purpose. If we have SecretData<_, EdPubKey> and SecretData<_, EdPrivKey>, we can't accidentally mix them up.


A good entrypoint into trying to figure out this design would be the module-level docs at orion::hazardous::base in this branch. There are longer-form examples and more prose describing the motivation.

vlmutolo avatar Apr 03 '22 03:04 vlmutolo

In order for us to evaluate these different strategies for viability (or non-viability), I think it would be good to come up with some criteria. Here's a first-draft list of what we would need from a successful design around const-generics. @brycx Let me know if you have comments/edits.

  • [x] Define newtypes (not aliases) for various types such that they can't be erroneously mixed together
  • [x] Avoid source code duplication across newtypes. Right now we're at about 7 or 8 LoC per newtype, and it's all data entry (e.g. defining min and max size for the type).
  • [x] (Obviously) support all the various methods we already have defined via macros.
  • [x] Define separate methods for public vs. secret types.
  • [x] Per-type documentation. It's annoying to see documentation for an HMAC SHA256 Tag when we're looking at a PasswordHash. The current macro design also has this issue. See below for some notes on the current docs situation in this iteration of the const-generics design.
  • [x] (?) Define ad-hoc additional methods per context. For example, we could define some random extra methods on SecretData<VecData, PasswordHash> if for some reason we wanted methods specific to a password hash.

vlmutolo avatar Apr 03 '22 04:04 vlmutolo

Right now the documentation situation is… meh. Anyone trying to read through the docs for either SecretData or PublicData will get bogged down in generic types if they look too closely. That said, the methods available on these types are pretty simple (basically from/to slice, get len, etc.). Maybe normal users don't need to really understand the generics.

It's nice that the SecretData and PublicData docs are split up in this iteration. Having the secret/public data methods intermingled was something I really didn't like about the previous design.

Also, we can still have documentation on the type alias itself. If we look at the docs for orion::hazardous::hash::blake2::blake2b::Digest in this branch, we see some (currently nonsensical) documentation for the type alias itself. I'm not sure what we'd want to put there vs. in the module-level docs (which is what we use currently), but it's an option.

vlmutolo avatar Apr 03 '22 04:04 vlmutolo

In order for us to evaluate these different strategies for viability (or non-viability), I think it would be good to come up with some criteria.

So far, I haven't come across additional criteria. Most of it seems to be covered by what you found!

brycx avatar Apr 17 '22 09:04 brycx

Recording here the original motivation for trying to replace the macro implementations. There were basically three reasons.

Byte Storage Parameterization

It's desirable for us (and maybe users) to be able to easily configure the storage mechanism used by the Orion APIs. For example, the type emitted by orion::auth::authenticate is orion::auth::Tag, which is an array. We can imagine an application where an Orion user is frequently reading Tags from the network and verifying them.

Right now, the user would have to copy the data from the network buffer into an owned Tag for verification. This is because the Tag type is currently hard-coded to be an array. Ideally, we would be able to support more than one byte storage type for things like Tag. We could maybe support a tag that holds a byte slice, or even a bytes::Bytes object, which is common in networking code.

We could probably achieve this while keeping macros by adding another parameter to the macro_rules! definition for the byte storage. Though, we'd also have to do one newtype per storage type per type parameter (PublicData<Array, Ctx0>, PublicData<Vec, Ctx0>, PublicData<Array, Ctx1>, PublicData<Vec, Ctx1>, etc.)

Readability / Auditability

Macros frequently pose challenges for those trying to investigate the implementation of specific types. As the most recent example, when I tried to figure out what the backing storage type of orion::auth::Tag was, I went to the docs page, clicked on auth, then Tag, and finally on "source", which brings you to this construct_tag! macro invocation. In order to understand more, I had to open my local copy of Orion, rg for macro_rules! construct_tag, and go check out that file.

Types defined using the const-generics+traits+newtypes have a better story here. After clicking on Tag, I would have seen that it was a newtype for something like PublicData<ArrayData, Blake2bTag>, and then I could have clicked through to each of those three individual types to learn more.

Documentation

This is a minor point, but if we check out that same Tag documentation (this also applies to most of the newtypes we have defined), we find an example that starts with:

use orion::hazardous::mac::hmac::sha512::Tag;

This is the wrong type, which is a bit confusing. Ideally it would say something like:

use orion::auth::Tag;

We actually see the same problem with the poly1305::Tag. The poly1305::Tag documentation shows the same example starting with orion::hazardous::mac::hmac::sha512::Tag. The underlying issue is that the macro_rules! construct_tag has the HMAC documentation inside it, which is necessarily shared with every newtype created by the construct_tag macro.

We can probably solve this without moving away from macros. (What would happen if we just moved the docs to above the declaration of the newtype from out of the body of the macro_rules!?) However, I think we can do a bit better by having specific documentation on the primary types and their operations (PublicData/PrivateData), the contexts (e.g. Blake2bTag), and the byte storage types (ArrayData/VecData/etc.).

vlmutolo avatar Apr 17 '22 16:04 vlmutolo

I think many of the traits that we define for these newtypes should be contained within a private module so that we avoid users being able to define their own newtypes perhaps?

… Just wondering if users'd be able to make invalid newtypes and pass them to a primitive's API.

I think it would be okay for users to define their own newtypes, but we should think carefully about this.

Users creating their own newtypes — if they want those newtypes to be usable with the Orion API — would only be able to customize the byte storage. They could certainly still do problematic things with the byte storage, such as logging the underlying bytes to stdout on creation or choosing random subsets of bytes to return on each AsRef call.

But these traits are in the hazardous module, and that's kind of what the module is there for. There are plenty of ways to shoot yourself in the foot by using functions and traits defined in hazardous. Adding "dangerous" traits to the mix doesn't seem like a big change from status quo.

If we wanted to prevent users from implementing traits like Bounded (should probably be renamed to Bound) and Generate, we'd probably have to "seal" the traits instead of making them private. I think we need them to be public if public methods on e.g. PrivateData are restricted by those traits.

vlmutolo avatar Apr 17 '22 20:04 vlmutolo

I mentioned in my last comment that users "would only be able to customize the byte storage". I should justify this. There are some subtleties here, and decisions to be made.

TL;DR: We can punt on this for now. I think the default should be that users can implement these traits on their own types, but that won't do them any good because all the Orion functions will require very specific types defined within Orion.


The reason they can only customize the byte storage is because I might expect Orion functions to take inputs like:

type SecretKey<B> = SecretData<B, XChaCha20Poly1305Aead>;
type Tag = PublicData<Array, Poly1305Tag>;

pub fn authenticate<B: Bound>(secret_key: &SecretKey<B>, data: &[u8]) -> Result<Tag, UnknownCryptoError>;

Basically, I'm saying that we can allow certain Orion functions to be generic over the byte storage type, which would allow us to take advantage of the byte storage parameter. I can't really imagine a scenario where we'd want to be generic over the context. The whole purpose of the context parameter was to prevent use of the wrong types that are otherwise identical.

That said, for this initial iteration, we could just sidestep this problem and say "you have to use exactly what type we give you", which is closest to what we do now. We can then open a separate issue about how/whether to allow parameterizing over the type.

The only thing I'm a little concerned about is this PartialEq impl:

impl<B0, B1, C> PartialEq<PublicData<B1, C>> for PublicData<B0, C>
where
    B0: AsRef<[u8]>,
    B1: AsRef<[u8]>,

I'm going to restrict this such that the B parameter must be the same, and the new signature would become:

impl<B: AsRef<[u8]>, C> PartialEq for PublicData<B, C>

This is significantly simpler and also doesn't allow users to compare against Orion types with their own custom types. This can be part of the "should users be able to create byte parameters" discussion.

vlmutolo avatar Apr 17 '22 21:04 vlmutolo

@brycx This is more or less in a place where I'm looking for "final" feedback on the general design. After that, I'm ready to move forward and actually implement all the public types in terms of const generics, and try to take this PR out of "draft" status.

When you have a chance, could you take a look at how the tests are declared per type? Here's an example for the AEAD SecretKey type's tests, and here's an example for Blake2b.

vlmutolo avatar Jul 18 '22 02:07 vlmutolo

@vlmutolo I think this design is good. I'd say it's ready to go on to the next round.

Re the tests: I think they look good. I took a quick look at the macros and so on; they seem not too cluttered and should offer the same coverage as the previous ones. Unless there's something specific you'd want me to look into there, I'd also say this is in good shape!

Just a note: I couldn't tell a major difference to what I reviewed last time. So if there's anything specific I need to take a look at again, feel free to let me know.

brycx avatar Aug 02 '22 14:08 brycx

There are a couple TODOs left in the code that I could use a second opinion on. Nothing major, and nothing that can't wait until we have a full, non-draft PR to look at.

For example, here's one regarding the naming of one of the "context" types.

I'll start filling in the rest of the types, and then we can revisit the minor TODOs (mostly just nitpicks).

vlmutolo avatar Aug 02 '22 14:08 vlmutolo

The TODOs I saw didn't seem too major, but maybe I missed some. The naming I've found no problem with yet, but in the case we want to change some, can always replace all references - which most IDEs support.

brycx avatar Aug 02 '22 16:08 brycx

True, but the context parameter names do technically show up in the public interface, even if usually hidden behind type defs. Once this PR gets closer to finished I'll try to think through the various context parameter names.

vlmutolo avatar Aug 02 '22 19:08 vlmutolo

Sounds like a very good plan.

brycx avatar Aug 02 '22 20:08 brycx