orion
orion copied to clipboard
convert newtypes to const generics
This is still a WIP. The basic method was to define an inner type
that behaves the way we want it to, e.g. with special Drop impls,
etc. Then for each newtype we want to create, we create a small
wrapper over the inner type, like struct Digest(Blake2bArray)
,
and implement two small traits, essentially From and AsRef, to
convert to that inner type.
In another module, we provide more complex blanket implementations
for any types that implement From and AsRef for our inner type.
This allow us to, for example, auto-implement len
, from_slice
,
etc for the newtypes without exposing the inner type (which could
be hazardous for inner types like the planned PrivateArray
type.
So there are lots of problems with this branch as-is.
- It's missing lots of documentation.
- The location/name of the
base.rs
module is probably not ideal. - There's probably a better way to do these trait gymnastics than the
OrionDeref
trait I defined.
So this is more of a proof-of-concept that we might be able to use traits and const generics to achieve both implementation deduplication and the safety of newtypes.
@brycx Let me know what you think. I'm probably going to try this same thing on some kind of secret data type tomorrow night to see if the same strategy would work there. The inner type will have some more complex logic, but I think it should still work.
So I tried another iteration, this time splitting up PublicData
and SecretData
. It's simpler that way, and more easily allows us to define separate methods on them without worrying about trait bounds and Public
/Secret
marker traits (those marker traits were removed).
Basically, the new design is based around two structs:
struct PublicData<B, C> {
bytes: B,
context: C,
}
struct SecretData<B, C> {
bytes: B,
context: C,
}
The B
type parameter allows us to use different byte storage objects to hold the underlying data, such as arrays, Vec
s, and maybe even eventually bytes::Bytes
. Both PublicData
and SecretData
share these underlying storage mechanisms.
The C
type parameter stands for "context", and allows us to express the purpose of the data store. Practically, it ensures we never accidentally misuse a data store meant for a different purpose. If we have SecretData<_, EdPubKey>
and SecretData<_, EdPrivKey>
, we can't accidentally mix them up.
A good entrypoint into trying to figure out this design would be the module-level docs at orion::hazardous::base
in this branch. There are longer-form examples and more prose describing the motivation.
In order for us to evaluate these different strategies for viability (or non-viability), I think it would be good to come up with some criteria. Here's a first-draft list of what we would need from a successful design around const-generics. @brycx Let me know if you have comments/edits.
- [x] Define newtypes (not aliases) for various types such that they can't be erroneously mixed together
- [x] Avoid source code duplication across newtypes. Right now we're at about 7 or 8 LoC per newtype, and it's all data entry (e.g. defining min and max size for the type).
- [x] (Obviously) support all the various methods we already have defined via macros.
- [x] Define separate methods for public vs. secret types.
- [x] Per-type documentation. It's annoying to see documentation for an HMAC SHA256
Tag
when we're looking at aPasswordHash
. The current macro design also has this issue. See below for some notes on the current docs situation in this iteration of the const-generics design. - [x] (?) Define ad-hoc additional methods per context. For example, we could define some random extra methods on
SecretData<VecData, PasswordHash>
if for some reason we wanted methods specific to a password hash.
Right now the documentation situation is… meh. Anyone trying to read through the docs for either SecretData
or PublicData
will get bogged down in generic types if they look too closely. That said, the methods available on these types are pretty simple (basically from/to slice, get len, etc.). Maybe normal users don't need to really understand the generics.
It's nice that the SecretData
and PublicData
docs are split up in this iteration. Having the secret/public data methods intermingled was something I really didn't like about the previous design.
Also, we can still have documentation on the type
alias itself. If we look at the docs for orion::hazardous::hash::blake2::blake2b::Digest
in this branch, we see some (currently nonsensical) documentation for the type alias itself. I'm not sure what we'd want to put there vs. in the module-level docs (which is what we use currently), but it's an option.
In order for us to evaluate these different strategies for viability (or non-viability), I think it would be good to come up with some criteria.
So far, I haven't come across additional criteria. Most of it seems to be covered by what you found!
Recording here the original motivation for trying to replace the macro implementations. There were basically three reasons.
Byte Storage Parameterization
It's desirable for us (and maybe users) to be able to easily configure the storage mechanism used by the Orion APIs. For example, the type emitted by orion::auth::authenticate
is orion::auth::Tag
, which is an array. We can imagine an application where an Orion user is frequently reading Tag
s from the network and verifying them.
Right now, the user would have to copy the data from the network buffer into an owned Tag
for verification. This is because the Tag
type is currently hard-coded to be an array. Ideally, we would be able to support more than one byte storage type for things like Tag
. We could maybe support a tag that holds a byte slice, or even a bytes::Bytes
object, which is common in networking code.
We could probably achieve this while keeping macros by adding another parameter to the macro_rules!
definition for the byte storage. Though, we'd also have to do one newtype per storage type per type parameter (PublicData<Array, Ctx0>
, PublicData<Vec, Ctx0>
, PublicData<Array, Ctx1>
, PublicData<Vec, Ctx1>
, etc.)
Readability / Auditability
Macros frequently pose challenges for those trying to investigate the implementation of specific types. As the most recent example, when I tried to figure out what the backing storage type of orion::auth::Tag
was, I went to the docs page, clicked on auth
, then Tag
, and finally on "source", which brings you to this construct_tag!
macro invocation. In order to understand more, I had to open my local copy of Orion, rg
for macro_rules! construct_tag
, and go check out that file.
Types defined using the const-generics+traits+newtypes have a better story here. After clicking on Tag
, I would have seen that it was a newtype for something like PublicData<ArrayData, Blake2bTag>
, and then I could have clicked through to each of those three individual types to learn more.
Documentation
This is a minor point, but if we check out that same Tag
documentation (this also applies to most of the newtypes we have defined), we find an example that starts with:
use orion::hazardous::mac::hmac::sha512::Tag;
This is the wrong type, which is a bit confusing. Ideally it would say something like:
use orion::auth::Tag;
We actually see the same problem with the poly1305::Tag
. The poly1305::Tag
documentation shows the same example starting with orion::hazardous::mac::hmac::sha512::Tag
. The underlying issue is that the macro_rules! construct_tag
has the HMAC documentation inside it, which is necessarily shared with every newtype created by the construct_tag
macro.
We can probably solve this without moving away from macros. (What would happen if we just moved the docs to above the declaration of the newtype from out of the body of the macro_rules!
?) However, I think we can do a bit better by having specific documentation on the primary types and their operations (PublicData
/PrivateData
), the contexts (e.g. Blake2bTag
), and the byte storage types (ArrayData
/VecData
/etc.).
I think many of the traits that we define for these newtypes should be contained within a private module so that we avoid users being able to define their own newtypes perhaps?
… Just wondering if users'd be able to make invalid newtypes and pass them to a primitive's API.
I think it would be okay for users to define their own newtypes, but we should think carefully about this.
Users creating their own newtypes — if they want those newtypes to be usable with the Orion API — would only be able to customize the byte storage. They could certainly still do problematic things with the byte storage, such as logging the underlying bytes to stdout on creation or choosing random subsets of bytes to return on each AsRef
call.
But these traits are in the hazardous module, and that's kind of what the module is there for. There are plenty of ways to shoot yourself in the foot by using functions and traits defined in hazardous. Adding "dangerous" traits to the mix doesn't seem like a big change from status quo.
If we wanted to prevent users from implementing traits like Bounded
(should probably be renamed to Bound
) and Generate
, we'd probably have to "seal" the traits instead of making them private. I think we need them to be public if public methods on e.g. PrivateData
are restricted by those traits.
I mentioned in my last comment that users "would only be able to customize the byte storage". I should justify this. There are some subtleties here, and decisions to be made.
TL;DR: We can punt on this for now. I think the default should be that users can implement these traits on their own types, but that won't do them any good because all the Orion functions will require very specific types defined within Orion.
The reason they can only customize the byte storage is because I might expect Orion functions to take inputs like:
type SecretKey<B> = SecretData<B, XChaCha20Poly1305Aead>;
type Tag = PublicData<Array, Poly1305Tag>;
pub fn authenticate<B: Bound>(secret_key: &SecretKey<B>, data: &[u8]) -> Result<Tag, UnknownCryptoError>;
Basically, I'm saying that we can allow certain Orion functions to be generic over the byte storage type, which would allow us to take advantage of the byte storage parameter. I can't really imagine a scenario where we'd want to be generic over the context. The whole purpose of the context parameter was to prevent use of the wrong types that are otherwise identical.
That said, for this initial iteration, we could just sidestep this problem and say "you have to use exactly what type we give you", which is closest to what we do now. We can then open a separate issue about how/whether to allow parameterizing over the type.
The only thing I'm a little concerned about is this PartialEq
impl:
impl<B0, B1, C> PartialEq<PublicData<B1, C>> for PublicData<B0, C>
where
B0: AsRef<[u8]>,
B1: AsRef<[u8]>,
I'm going to restrict this such that the B
parameter must be the same, and the new signature would become:
impl<B: AsRef<[u8]>, C> PartialEq for PublicData<B, C>
This is significantly simpler and also doesn't allow users to compare against Orion types with their own custom types. This can be part of the "should users be able to create byte parameters" discussion.
@brycx This is more or less in a place where I'm looking for "final" feedback on the general design. After that, I'm ready to move forward and actually implement all the public types in terms of const generics, and try to take this PR out of "draft" status.
When you have a chance, could you take a look at how the tests are declared per type? Here's an example for the AEAD SecretKey
type's tests, and here's an example for Blake2b.
@vlmutolo I think this design is good. I'd say it's ready to go on to the next round.
Re the tests: I think they look good. I took a quick look at the macros and so on; they seem not too cluttered and should offer the same coverage as the previous ones. Unless there's something specific you'd want me to look into there, I'd also say this is in good shape!
Just a note: I couldn't tell a major difference to what I reviewed last time. So if there's anything specific I need to take a look at again, feel free to let me know.
There are a couple TODOs left in the code that I could use a second opinion on. Nothing major, and nothing that can't wait until we have a full, non-draft PR to look at.
For example, here's one regarding the naming of one of the "context" types.
I'll start filling in the rest of the types, and then we can revisit the minor TODOs (mostly just nitpicks).
The TODOs I saw didn't seem too major, but maybe I missed some. The naming I've found no problem with yet, but in the case we want to change some, can always replace all references - which most IDEs support.
True, but the context parameter names do technically show up in the public interface, even if usually hidden behind type
defs. Once this PR gets closer to finished I'll try to think through the various context parameter names.
Sounds like a very good plan.