mbedtls icon indicating copy to clipboard operation
mbedtls copied to clipboard

PSA Sign Hash (Interruptible) Design

Open paul-elliott-arm opened this issue 2 years ago • 2 comments

Description

A design for the psa_sign_hash (interruptible) interface. (Implementation to follow once design agreed)

Status

READY

Requires Backporting

NO

Migrations

NO

Todos

  • [ ] Tests

paul-elliott-arm avatar Sep 13 '22 16:09 paul-elliott-arm

CI Fail is due to 'missing' error code. I could fix this by defining it in file for the time being, if this is acceptable.

paul-elliott-arm avatar Sep 15 '22 08:09 paul-elliott-arm

You need to define the new error code, either in crypto_values.h or in crypto_extra.h.

gilles-peskine-arm avatar Sep 15 '22 08:09 gilles-peskine-arm

I had a read-through and left some notes, nothing major I think.

I really like the split between start and complete, which I think avoids a lot of problems in usage and in implementation. I think as a result, most of the "with the same parameters" requirements become superfluous, which is a good thing, but the documentation should reflect this.

IMO the main question is about the scope of this; is it possible that this will be extended:

  • To other operations? IMO yes, at least ECDH. Not a problem in practice, but perhaps naming should reflect this.
  • To the same operations based on other kinds of maths? Like RSA sign/verify (though perhaps it's a thing of the past in the controlled environments were super-constrained devices live - except perhaps RSA verify which is still fast, but then perhaps it doesn't need to be interruptible)? Or some PQ schemes (might be more likely)? In that case it's not really clear to me that having a single set_max_ops() setting with work. The duration of a "basic" operation already varies form curve to curve, but it's going to be wildly different for RSA and various PQ schemes. Do we need to allow one max_ops setting per family of algs? Or do we just hope that the super-constrained devices that are going to use this are likely to only use one family?

mpg avatar Sep 27 '22 08:09 mpg

Ah, just thought of more things right after posting my last comment. Is the global setting max_ops a good idea? Shouldn't this be per operation instead, like one of the following?

void psa_interruptible_set_max_ops( psa_sign_interruptible_context *ctx, size_t max_ops );
/* or */
psa_status_t psa_sign_hash_start( psa_sign_interruptible_context *ctx, size_t max_ops,
    /* ... */ );

This would neatly solve the issue about "basic operations" having different meanings for various curves, and potentially in the future for primitives based on various kinds of maths.

Also avoids questions like "Am I allowed to call set_max_ops() between start() and complete() and what would that mean?"

mpg avatar Sep 27 '22 08:09 mpg

Note however that making max_ops not a global setting would create a mismatch with the existing Mbed TLS implementation and API. However, I don't think that should prevent use from making that move if we think it's the right one: designing the PSA Crypto API is an opportunity to make the API better, such opportunities are rare and we should use them when we can. (We can probably extend the Mbed TLS API in a compatible way so that it supports this.)

mpg avatar Sep 27 '22 08:09 mpg

I've been thinking: on one hand, providing max_ops as an argument to start() has a lot going for it, as it's the point where you also pass the key, which basically determines what "basic operation" means, so this neatly solves the problem that "basic operation" has variable meaning by only specifying max_ops in a context that clarifies its meaning. This is great when the application does the signature/verification directly.

On the other hand, when the operation happens via another layer (X.509, TLS), this layer is now responsible for deciding what max_ops to pass. We could provide things like mbedtls_ssl_conf_set_interruptible_max_ops() but we run again into the problem that we don't know what "basic operation" means until the curve (or other) has been chosen, which will happen as part of the handshake - so we haven't solved the problem just moved it to other layers.

Now, just throwing ideas around, not saying they're any good, but trying to stir up discussion:

  1. TLS/X.509 could accept a callback function that takes as inputs a key type and bitsize, and return the value of max_ops to use;
  2. psa_interruptible_set_max_ops() could take as input a list of (key type, bitsize, max ops) triplets (kind of like the above, but at the PSA level and with a table instead of callback).
  3. The above two are just me overthinking the problem, people can just pick the lowest value in the above table and use it as they global setting: it's not a problem if functions return more often than they need to. (Edit: I guess this is the reasoning behind using 1 as the default value, which seems sensible to me.)

I think in practice, "how much time is it acceptable to block" does not depend on the operation but on the system: in that sense, a single global setting makes sense. The problem is, the API only provides "how many 'basic operations'" and how that translates to "how much time" depends on the key...

mpg avatar Sep 28 '22 11:09 mpg

Apologies for the force push. As per Gilles request I have not only made changes, but also moved various parts of the code into different locations, so it was going to be something of a mess anyway.

paul-elliott-arm avatar Oct 12 '22 12:10 paul-elliott-arm

I'm marking this as approved for design. We'll still make minor changes to the API, but the fundamentals are good.

gilles-peskine-arm avatar Dec 06 '22 18:12 gilles-peskine-arm

General note about this PR: it's in the Mbed TLS repo but is written as a specification that may have multiple implementations. At some point while implementing this, we'll need to replace all places that says "XXX is implementation defined" with a description of what our implementation actually does.

mpg avatar Dec 12 '22 09:12 mpg

I'm sorry for having such thought that late in the process, but I've just been thinking about the naming for start/complete functions: should rename start to setup for consistency with multi-part operations? setup() is usually the function that makes the operation "active" meaning from this point you need to either bring it to a successful completion or abort() it.

If this is similar enough to existing multi-part operations, the similarity should be reflected in the names chosen; just as well, if this is different enough, this should be reflected in the names. Note: currently we have similar names for the type xxx_operation_t and the function xxx_abort() but not for other functions (start() vs setup(), perhaps even complete() vs finish() though arguably that pair is more different and varies more across multi-part operations).

Also, I don't think start() actually starts computing anything, it just prepares calling complete(), so semantically that would be an argument for calling it setup() as well.

Also, since Gilles mentions that we're likely to want to have multi-part interruptible sign/verify at some point, can we quickly check how the naming scheme would work for that? [Edit: should we have interruptible_setup() to distinguish from non-interruptible multi-part?]

Btw, I've also been wondering if complete() should be called something like compute() (or perhaps progress())? Considering most of the time it will not complete the operation. (OTOH, there's logic in the fact that it only returns success when it did complete the operation, so that's consistent with the name indeed.)

mpg avatar Dec 12 '22 10:12 mpg

Btw, I've also been wondering if complete() should be called something like compute() (or perhaps progress())? Considering most of the time it will not complete the operation. (OTOH, there's logic in the fact that it only returns success when it did complete the operation, so that's consistent with the name indeed.)

That was the logic: xxx_complete() is named based on the behavior when it succeeds.

start() vs setup(), perhaps even complete() vs finish() though arguably that pair is more different and varies more across multi-part operation

complete() is pretty different from finish(), and I think this should be reflected in the naming. start() is less different from setup(), but I think it's clearer to have different names for start/complete and setup/update/finish or setup/update/verify.

gilles-peskine-arm avatar Dec 13 '22 10:12 gilles-peskine-arm

All of the discussed changes that I saw as decided upon I have made in the implementation PR (#6737). Will revisit the ongoing discussions in the new year, and update this version if there is more review to be done here.

paul-elliott-arm avatar Dec 18 '22 17:12 paul-elliott-arm

Should we have an implementation note somewhere in here that ops should be key-value and data-value invariant, to avoid it becoming a side-channel?

athoelke avatar Jan 25 '23 21:01 athoelke

I assume we no longer intend to merge this one, now that we have #6737.

mpg avatar Feb 16 '23 12:02 mpg

We don't, but I still have to review the differences between this and #6737. I'm happy that the API #6737 is good enough to release as a beta, but I want to check for differences and re-read some of the discussions with the benefit of having seen the implementation.

This doesn't require keeping the PR open.

gilles-peskine-arm avatar Feb 16 '23 18:02 gilles-peskine-arm

@paul-elliott-arm can this be closed?

daverodgman avatar May 12 '23 13:05 daverodgman

Closing this for now as I think its reached its useful limit. Conversations about renaming functions etc can be done in new issues / PRs

paul-elliott-arm avatar May 12 '23 13:05 paul-elliott-arm