mbedtls
mbedtls copied to clipboard
PSA Sign Hash (Interruptible) Design
Description
A design for the psa_sign_hash (interruptible) interface. (Implementation to follow once design agreed)
Status
READY
Requires Backporting
NO
Migrations
NO
Todos
- [ ] Tests
CI Fail is due to 'missing' error code. I could fix this by defining it in file for the time being, if this is acceptable.
You need to define the new error code, either in crypto_values.h
or in crypto_extra.h
.
I had a read-through and left some notes, nothing major I think.
I really like the split between start
and complete
, which I think avoids a lot of problems in usage and in implementation. I think as a result, most of the "with the same parameters" requirements become superfluous, which is a good thing, but the documentation should reflect this.
IMO the main question is about the scope of this; is it possible that this will be extended:
- To other operations? IMO yes, at least ECDH. Not a problem in practice, but perhaps naming should reflect this.
- To the same operations based on other kinds of maths? Like RSA sign/verify (though perhaps it's a thing of the past in the controlled environments were super-constrained devices live - except perhaps RSA verify which is still fast, but then perhaps it doesn't need to be interruptible)? Or some PQ schemes (might be more likely)? In that case it's not really clear to me that having a single
set_max_ops()
setting with work. The duration of a "basic" operation already varies form curve to curve, but it's going to be wildly different for RSA and various PQ schemes. Do we need to allow onemax_ops
setting per family of algs? Or do we just hope that the super-constrained devices that are going to use this are likely to only use one family?
Ah, just thought of more things right after posting my last comment. Is the global setting max_ops
a good idea? Shouldn't this be per operation instead, like one of the following?
void psa_interruptible_set_max_ops( psa_sign_interruptible_context *ctx, size_t max_ops );
/* or */
psa_status_t psa_sign_hash_start( psa_sign_interruptible_context *ctx, size_t max_ops,
/* ... */ );
This would neatly solve the issue about "basic operations" having different meanings for various curves, and potentially in the future for primitives based on various kinds of maths.
Also avoids questions like "Am I allowed to call set_max_ops()
between start()
and complete()
and what would that mean?"
Note however that making max_ops
not a global setting would create a mismatch with the existing Mbed TLS implementation and API. However, I don't think that should prevent use from making that move if we think it's the right one: designing the PSA Crypto API is an opportunity to make the API better, such opportunities are rare and we should use them when we can. (We can probably extend the Mbed TLS API in a compatible way so that it supports this.)
I've been thinking: on one hand, providing max_ops
as an argument to start()
has a lot going for it, as it's the point where you also pass the key, which basically determines what "basic operation" means, so this neatly solves the problem that "basic operation" has variable meaning by only specifying max_ops
in a context that clarifies its meaning. This is great when the application does the signature/verification directly.
On the other hand, when the operation happens via another layer (X.509, TLS), this layer is now responsible for deciding what max_ops
to pass. We could provide things like mbedtls_ssl_conf_set_interruptible_max_ops()
but we run again into the problem that we don't know what "basic operation" means until the curve (or other) has been chosen, which will happen as part of the handshake - so we haven't solved the problem just moved it to other layers.
Now, just throwing ideas around, not saying they're any good, but trying to stir up discussion:
- TLS/X.509 could accept a callback function that takes as inputs a key type and bitsize, and return the value of
max_ops
to use; -
psa_interruptible_set_max_ops()
could take as input a list of (key type, bitsize, max ops) triplets (kind of like the above, but at the PSA level and with a table instead of callback). - The above two are just me overthinking the problem, people can just pick the lowest value in the above table and use it as they global setting: it's not a problem if functions return more often than they need to. (Edit: I guess this is the reasoning behind using
1
as the default value, which seems sensible to me.)
I think in practice, "how much time is it acceptable to block" does not depend on the operation but on the system: in that sense, a single global setting makes sense. The problem is, the API only provides "how many 'basic operations'" and how that translates to "how much time" depends on the key...
Apologies for the force push. As per Gilles request I have not only made changes, but also moved various parts of the code into different locations, so it was going to be something of a mess anyway.
I'm marking this as approved for design. We'll still make minor changes to the API, but the fundamentals are good.
General note about this PR: it's in the Mbed TLS repo but is written as a specification that may have multiple implementations. At some point while implementing this, we'll need to replace all places that says "XXX is implementation defined" with a description of what our implementation actually does.
I'm sorry for having such thought that late in the process, but I've just been thinking about the naming for start/complete functions: should rename start
to setup
for consistency with multi-part operations? setup()
is usually the function that makes the operation "active" meaning from this point you need to either bring it to a successful completion or abort()
it.
If this is similar enough to existing multi-part operations, the similarity should be reflected in the names chosen; just as well, if this is different enough, this should be reflected in the names. Note: currently we have similar names for the type xxx_operation_t
and the function xxx_abort()
but not for other functions (start()
vs setup()
, perhaps even complete()
vs finish()
though arguably that pair is more different and varies more across multi-part operations).
Also, I don't think start()
actually starts computing anything, it just prepares calling complete()
, so semantically that would be an argument for calling it setup()
as well.
Also, since Gilles mentions that we're likely to want to have multi-part interruptible sign/verify at some point, can we quickly check how the naming scheme would work for that? [Edit: should we have interruptible_setup()
to distinguish from non-interruptible multi-part?]
Btw, I've also been wondering if complete()
should be called something like compute()
(or perhaps progress()
)? Considering most of the time it will not complete the operation. (OTOH, there's logic in the fact that it only returns success when it did complete the operation, so that's consistent with the name indeed.)
Btw, I've also been wondering if complete() should be called something like compute() (or perhaps progress())? Considering most of the time it will not complete the operation. (OTOH, there's logic in the fact that it only returns success when it did complete the operation, so that's consistent with the name indeed.)
That was the logic: xxx_complete()
is named based on the behavior when it succeeds.
start() vs setup(), perhaps even complete() vs finish() though arguably that pair is more different and varies more across multi-part operation
complete()
is pretty different from finish()
, and I think this should be reflected in the naming. start()
is less different from setup()
, but I think it's clearer to have different names for start/complete and setup/update/finish or setup/update/verify.
All of the discussed changes that I saw as decided upon I have made in the implementation PR (#6737). Will revisit the ongoing discussions in the new year, and update this version if there is more review to be done here.
Should we have an implementation note somewhere in here that ops
should be key-value and data-value invariant, to avoid it becoming a side-channel?
I assume we no longer intend to merge this one, now that we have #6737.
We don't, but I still have to review the differences between this and #6737. I'm happy that the API #6737 is good enough to release as a beta, but I want to check for differences and re-read some of the discussions with the benefit of having seen the implementation.
This doesn't require keeping the PR open.
@paul-elliott-arm can this be closed?
Closing this for now as I think its reached its useful limit. Conversations about renaming functions etc can be done in new issues / PRs