libp11 icon indicating copy to clipboard operation
libp11 copied to clipboard

[BROKEN] Attempt to refcount the engine with EVP_PKEYs we give out

Open dwmw2 opened this issue 9 years ago • 35 comments

So... in https://github.com/openssl/openssl/pull/1643#issuecomment-250328877 @levitte points out that we are Doing It Wrong™. We should ensure that our engine still has a functional refcount, for each EVP_PKEY we give out.

This patch is a first attempt at doing that (excuse the libp11-int.h bit for the moment).

But there is a problem, and I'm hoping @levitte can help show us the right answer please.

Now that I use RSA_new_method(engine) as instructed, it crashes:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff71fe741 in RSA_free (r=0x9a2d30) at crypto/rsa/rsa_lib.c:137
137     if (r->meth->finish)
(gdb) bt
#0  0x00007ffff71fe741 in RSA_free (r=0x9a2d30) at crypto/rsa/rsa_lib.c:137
#1  0x00007ffff71fe6ca in RSA_new_method (engine=0x95ba30) at crypto/rsa/rsa_lib.c:120
#2  0x00007ffff71cc886 in pkcs11_get_rsa (key=0x9a3b70) at p11_rsa.c:184
#3  pkcs11_get_evp_key_rsa (key=0x9a3b70) at p11_rsa.c:243
(gdb) p r->meth
$3 = (const RSA_METHOD *) 0x0

Of course, r->meth is NULL. Because when I was setting engine methods with ENGINE_set_ECDSA(), OpenSSL was committing the heinous crime of actually trying to use them, as discussed in #107 — it was doing ephemeral ECDH for an ECDHE ciphersuite, and decided it'd use our engine methods, which are only valid for the EVP_PKEYs we create ourselves, for its own software key.

So what is an engine supposed to do here? It looks like we're required to use ENGINE_set_ECDSA() and ENGINE_set_RSA() if we want the refcounting to work, and we're required not to use them if we don't want to be asked to perform operations on objects that aren't ours.

Is there an ENGINE_set_ECDSA_but_not_for_default() that I'm missing... or something else I'm missing? @levitte?

dwmw2 avatar Sep 29 '16 10:09 dwmw2

One option perhaps, if OpenSSL doesn't have a coherent answer to this, is that every case where we currently fail with PKCS11_ALIEN_KEY should just fall through to the previous default method — basically calling back into OpenSSL:? I don't like that much, mind you...

dwmw2 avatar Sep 29 '16 10:09 dwmw2

With all due respect, this PR is insane. Does any other engine do it this way?

mtrojnar avatar Sep 29 '16 10:09 mtrojnar

Another option for OpenSSL <= 1.0.2 might be to keep using RSA_new() but to manually set its ->engine for ourselves on the returned object. Can't do that in 1.1+ though.

dwmw2 avatar Sep 29 '16 10:09 dwmw2

With all due respect, this PR is insane. Does any other engine do it this way?

Allegedly they all do. I jjust have the strong feeling that I'm missing something here.

I feel like I should go back to Tuesday and my initial "WTF, I asked curl to use an RSA key from PKCS#11 — why am I getting a PKCS11_ALIEN_KEY error from pkcs11_ecdsa_sign_sig()?", and start all over again...

dwmw2 avatar Sep 29 '16 10:09 dwmw2

Allegedly they all do

I guess I should have been more precise...

I checked OpenSSL 1.1.0 engines for RSA_new_method() invocations. They are used by e_capi.c and e_chil.c. None of them stores the ENGINE reference in any data structure. Instead, they just directly use the ENGINE pointer provided in load_privkey_function/load_pubkey_function.

In other words, instead of adding a new function to the libp11 API, we should use the opportunity to remove the obsolete RSA and EC interface (which was already on the roadmap).

It is time to fork libp11 0.5.0, and add ENGINE *engine to the parameters of PKCS11_get_private_key() and PKCS11_get_public_key().

mtrojnar avatar Sep 29 '16 11:09 mtrojnar

... or even better to obsolete them (for compatibility) and add new PKCS11_engine_get_private_key() and PKCS11_engine_get_public_key().

mtrojnar avatar Sep 29 '16 11:09 mtrojnar

That's bikeshedding though. However the ENGINE * gets to RSA_new_method () we still have the fundamental problem.

dwmw2 avatar Sep 29 '16 11:09 dwmw2

I can see quite a few problems here. Which of them is the fundamental one for you? I guess we can implement the fall through to the OpenSSL default methods fairly easily. Then we can proceed with getting the engine refcount right. Does this plan make sense?

mtrojnar avatar Sep 29 '16 11:09 mtrojnar

The one I was calling the "fundamental" problem was that fact that (AFAICT) OpenSSL simultaneously requires that we

  • MUST use ENGINE_set_RSA() in order for the ref counting via RSA_new_method() to work, and
  • MUST NOT use ENGINE_set_RSA() if we don't want our method to be made the default, and then invoked even for software keys we can't handle.

That just seems wring to me, and I would love @levitte to confirm that we aren't missing something, before we proceed down that road.

Falling through to the OpenSSL software methods is not an ideal solution. What if another engine is present which genuinely does register generic methods for software keys, like the AESNI engine used to? Do we fall through to that? What if it's loaded before/after us, and then do things work right if it's unloaded before/after us? There is a can of worms of semantics there that we just shouldn't have to be involved with.

I'd be much happier in <=1.0.2 just to manually set RSA->engine after calling RSA_new() and perhaps ask for an accessor which lets us do that in 1.1. But I would really like guidance from the OpenSSL side on how this is supposed to work.

dwmw2 avatar Sep 29 '16 12:09 dwmw2

Existing engines (e_capi.c, e_chil.c, e_dasync.c) do call ENGINE_set_RSA(). If this is a problem then it is bigger than just our engine.

mtrojnar avatar Sep 29 '16 12:09 mtrojnar

So this didn't actually start to bite us until we supported EC, and got invoked to handle ECDHE cipher suites. Maybe they just haven't updated yet? We can probably contrive a test case to break this with RSA...

dwmw2 avatar Sep 29 '16 12:09 dwmw2

@mtrojnar, functions such as RSA_new_method() have a ENGINE_init() when an engine is provided (i.e. when the engine parameter is non-NULL), and they do attach the pointer to the engine in the data structure. That's how you get a functional reference to the engine associated with the key.

See here for how RSA_new_method() does it.

levitte avatar Sep 29 '16 13:09 levitte

@levitte, you miss the point. How do we reconcile the MUST vs. MUST NOT bullet points a few comments up?

dwmw2 avatar Sep 29 '16 13:09 dwmw2

Sorry, my comment there was a side note and an answer to a question @mtrojnar asked higher up.

I'm currently looking through #107 and trying to understand what's really going on, 'cause I think there are a number of assumptions, both yours and mine, running high... I'll be back in a few.

levitte avatar Sep 29 '16 13:09 levitte

You know, if I understand things correctly, the real issue here is that OpenSSL's apps/apps.c:setup_engine() calls ENGINE_set_default(). Without that call, the methods you set in the engine will only be used on demand (such as when calling RSA_new_method() with a non-NULL parameter), not by default (which is the actual issue in #107). Does that make sense?

A bit of historical background: The ENGINE module started out with wanting crypto acceleration, and with that, making the engine's methods default makes a lot of sense. Then we added support for hw stored keys, and ran with the assumption that if you use an engine that has support for hw stored keys, you'd want to use those instead of pem files, period. We may have had to fiddle with the creation of temporary RSA keys at some point (my memory is blurry about that), but...

So, to support hardware stored keys, there must be a corresponding method for that type of key, so that whenever the parts of the key that's hidden in hardware (typically, the private bits), the engine method's functions are used. Does that make sense to you?

Back to the issue with having engine methods becoming default - obviously, that's where the actual issue is - one solution could be to remove the call to ENGINE_set_default() in the openssl application, but that won't help against all other programs making the same call. Another solution is for the engine to recognise if this is an engine stored key, and do its own operation in that case, or otherwise to call the corresponding OpenSSL function (in the RSA case, that would be RSA_meth_get_priv_enc(RSA_PKCS1_OpenSSL()) for private encryption, in OpenSSL 1.1.0).

levitte avatar Sep 29 '16 14:09 levitte

You know, if I understand things correctly, the real issue here is that OpenSSL's apps/apps.c:setup_engine() calls ENGINE_set_default(). … Does that make sense?

Completely.

one solution could be to remove the call to ENGINE_set_default() in the openssl application, but that won't help against all other programs making the same call.

Right. I don't fancy a bombing run on all the applications which have cargo-culted it from there and trying to fix them not to. I didn't even have much luck with the first one I tried to fix.

Another solution is for the engine to recognise if this is an engine stored key, and do its own operation in that case, or otherwise to call the corresponding OpenSSL function (in the RSA case, that would be RSA_meth_get_priv_enc(RSA_PKCS1_OpenSSL()) for private encryption, in OpenSSL 1.1.0).

Kind of icky. What if another engine genuinely does provide a generic replacement for those functions on software keys (a bit like the AESNI engine used to provide software acceleration)? Now we get to deal with issues of priority and what happens if the various engines are loaded and unloaded in different orders. Should an engine just use the method returned by ENGINE_get_default_RSA() at the moment it's loaded, and call through to that for ever more, for keys it doesn't like?

I can live with that as a workaround but don't like it as a "solution" going forward.

But actually, I have a better workaround for OpenSSL <= 1.0.2. I don't use RSA_new_method(); instead I just set rsa->engine on the key that's returned by RSA_new(). You'll note engine_pkcs11 was installing its own methods with RSA_set_method() anyway. Then we don't need to use ENGINE_set_RSA(), and all is well because we can never become the default.

That workaround only falls down on 1.1 where the keys are opaque. If you give us an accessor for RSA_set_engine() and EC_KEY_set_engine() then that gives us a viable workaround there too. And we can come up with a nicer fix for later. Like having engines indicate a mask (matching the mask in ENGINE_set_default() of capabilities they are willing to be default for.

dwmw2 avatar Sep 29 '16 14:09 dwmw2

Around some of your questions and assumptions: the ENGINE stuff was never written to support the use of several engines at the same time. It's really a one-engine-per-process deal...

levitte avatar Sep 29 '16 15:09 levitte

Mind you, it is possible to use several engines, as long as they don't compete on the supported algorithms.

levitte avatar Sep 29 '16 15:09 levitte

the ENGINE stuff was never written to support the use of several engines at the same time. It's really a one-engine-per-process deal...

So... within the same process (which might include multiple protocol plugins, multiple accounts within a mail client or IM program, etc.) we're only supposed to load one engine? So if I need to use a key from a TPM for one SSL connection, and a key from PKCS#11 for another, that isn't supposed to work?

dwmw2 avatar Sep 29 '16 15:09 dwmw2

But actually, I have a better workaround for OpenSSL <= 1.0.2. I don't use RSA_new_method(); instead I just set rsa->engine on the key that's returned. You'll note engine_pkcs11 was installing its own methods with RSA_set_method() anyway. Then we don't need to use ENGINE_set_RSA(), and all is well because we can never become the default.

Don't forget to ENGINE_init().

It's kinda hackish, but I can see the benefit there.

That workaround only falls down on 1.1 where the keys are opaque. If you give us an accessor for RSA_set_engine() and EC_KEY_set_engine() then that gives us a viable workaround there too.

Frankly, that feels kinda icky...

And we can come up with a nicer fix for later. Like having engines indicate a mask (matching the mask in ENGINE_set_default()) of capabilities they are willing to be default for.

I've thoughts in the general direction too, and would largely prefer that to RSA_set_engine() et al. Regardless, this is something that won't happen before OpenSSL 1.1.1. Our versioning policy forbids addition of functions in letter updates, so it just won't happen in the 1.1.0 series.

levitte avatar Sep 29 '16 15:09 levitte

It's kinda hackish, but I can see the benefit there.

Yeah. And it's fairly close to what we're already doing. I won't say I like it more... but I dislike it less than the option of calling through to the internal software methods. If this worked in 1.1 too, it'd be a no-brainer for me. But...

Frankly, that feels kinda icky...

Oh, it gets more icky. Your versioning policy forbids you from giving me that accessor. But does it also forbid you from changing the binary structure? For the limited case of OpenSSL 1.1.x can we just hold our nose and "know" precisely where in the ec_key_st and the rsa_st the engine pointers reside?

Ick :) And besides, you made those opaque so that arguably you do have the freedom to mess with them. And even if that weren't the case, I think it pushes me over the edge and now I do dislike this solution more than I dislike the option of just calling through to the previous default methods.

So let's do that. I'll call ENGINE_get_default_RSA() and ENGINE_get_default_EC() when the engine is loaded (before it can have been made the default), and call through to them in the cases where we'd currently bail out with that PKCS11_ALIEN_KEY error.

dwmw2 avatar Sep 29 '16 15:09 dwmw2

And you'll do that mask, or some other better solution for OpenSSL 1.$NEXT.

Although actually, my plan of record is that I won't care about all this engine nonsense in OpenSSL 1.$NEXT because we had an approval in principle that we would add PKCS#11 support to crypto/pkcs11 as a first-class citizen after 1.1 was out the door :)

dwmw2 avatar Sep 29 '16 15:09 dwmw2

You're thinking that crypto/pkcs11 would become an alternative to the ENGINE API, right? I do agree with that motion, have been thinking along those lines for a while. That would make the "plugin" interface much more OpenSSL version agnostic.

levitte avatar Sep 29 '16 16:09 levitte

You're thinking that crypto/pkcs11 would become an alternative to the ENGINE API, right?

Note that we already have an alternative to the ENGINE API. Applications can link directly to libp11 and use its API directly. That's what OpenConnect does, for example, because it gives a little more flexibility than the ENGINE.

I've rounded up the historical contributors to libp11 and have obtained permission to relicense their code under the OpenSSL licence. And apart from the licence it looks like it was basically written to be OpenSSL code.

So the straw man initial proposal would be "take libp11 and drop it as-is into crypto/pkcs11".

Obviously I really do mean that as a straw man — there would be ample review feedback if I literally tried to do just that. But once we drop the bits which are already marked deprecated, do a review pass over the rest for sanity, and try to add it to crypto/pkcs11 in bite-sized reviewable pieces... yeah, something like that.

There's merit in OpenSSL $NEXT being compatible with those applications which were using libp11 before, if we can achieve that.

dwmw2 avatar Sep 29 '16 16:09 dwmw2

And once it's physically present, then we can look at integrating it properly, and interfaces to make OpenSSL Do The Right Thing, automatically accepting a PKCS#11 URI where it would accept filenames for loading keys and certs, etc. — generally complying with http://david.woodhou.se/draft-woodhouse-cert-best-practice.html

dwmw2 avatar Sep 29 '16 16:09 dwmw2

Which means that we really need to write a storage wrapper that will accept URIs, parse them and dispatch them to different routines depending on the URI schema.

levitte avatar Sep 29 '16 17:09 levitte

We're getting very off topic, by the way

levitte avatar Sep 29 '16 17:09 levitte

I think that the conclusion of all this is:

  • that openssl#1644 is necessary for your planned hacking of the keys for the OpenSSL 1.0.2 series.
  • that your planned hack is impossible for the 1.1.0 series.
  • that there should be a solution for 1.1.$next that allows to set a separate method specifically associated to load keys, that will not be used as a default method.
  • [kinda off topic] that we should plan for crypto/pkcs11 for 1.$next (not the same as 1.1.$next)

Seems right to you?

Note that it's possible that we can get crypto/pkcs11 into 1.1.$next, but I currently doubt that, as it will be a major change in functionality and general key storage handling.

levitte avatar Sep 29 '16 17:09 levitte

Looks good to me.

dwmw2 avatar Sep 29 '16 17:09 dwmw2

• that your planned hack is impossible for the 1.1.0 series.

The result of that, of course, is that I have to re-think my "planned hack", and at least for 1.1.0 I'll need to do the pass-through-to-internal-method trick.

Which means I might do that for older versions too just to be consistent.... or maybe not because EC_KEY vs. ECDSA+ECDH is a mess of ifdefs anyway so maybe it's no cleaner to do it "the same".

dwmw2 avatar Sep 29 '16 18:09 dwmw2