urllib3 icon indicating copy to clipboard operation
urllib3 copied to clipboard

Consider supporting ca_certs specified as file-like object

Open ecbftw opened this issue 5 years ago • 9 comments

Hi there. Thanks for a great library.

Here's what I'm trying to do: I'd like to provide users a trust-on-first-use (TOFU) mechanism for self-signed certificates in a complex application. I'm storing observed certificates (self-signed, or signed by a CA that isn't public) in my database and then I give the user the option whether or not to trust certain certificates. (Yes, this could be risky for the user, but the reality of the enterprise world is that no one wants to centrally manage internal CAs and TOFU is much safer than disabling validation. Do you manually verify every SSH host key you see??? ;-) )

From there, I'd like to pass a CA bundle to urllib3 (or requests, etc) via either a simple buffer or file-like object. Lo and behold, others have asked for this in #474. Back when that issue was closed in 2016, there didn't seem to be an easily supportable way to do this with the standard python library. However, I think the world may have changed since then. Consider that the standard library's SSLContext.load_verify_locations method supports a cadata argument that we could use for this. This argument was added in Python 3.4. Python 3.3 support ended on 2017-09-29.

Would it make sense now to implement this fully in urllib3 now? My suggestion is: A) Caller may provide ca_certs as file-like object, which is easily distinguishable from a string that specifies a path B) If a file-like object is received and urllib3 is using pyOpenSSL, then use the work around described in #474. C) If a file-like object is received and urllib3 is using the standard library, fully read-in the contents of the file-like object and pass it as the cadata argument.

Thoughts?

ecbftw avatar Dec 04 '19 23:12 ecbftw

What do you think about using assert_fingerprint? First calculate the SHA256 of the certificate, then pass like this to a PoolManager:

import urllib3

http = urllib3.PoolManager(
    assert_fingerprint="6FA628EDA9F8679B08F95FD7116E35D077DBB84F8108623E660E6683FDD77556",
    ...
)

You might also find my blog post which mentions a lot of things about TOFU / fingerprinting interesting: https://sethmlarson.dev/blog/2019-11-26/designing-for-real-world-https

sethmlarson avatar Dec 05 '19 04:12 sethmlarson

Hi @sethmlarson, thanks for the suggestion. I wasn't aware of these assert_... options in urllib3, but I'm not sure it will give me the flexibility that I desire.

Consider a case where a service starts off using self-signed certs. We do the TOFU thing with assert_fingerprint and maybe assert_hostname, and that works OK. But then later the user of my software decides to do it the right way and generates their own CA certificate. They install it in my software and then proceed to replace various service certificates with ones signed by that CA. At that point, my software should realize the certs can be verified off of the new CA. But with assert_fingerprint, the verification would fail even though a "better" certificate is now installed.

There are likely work-arounds for this (e.g. try to connect twice with different settings,etc), but they are slow/ugly and it would just be far more flexible if I can customize my CA list according to the logic I deem appropriate up front and then just connect once with the appropriately groomed CA bundle.

Note that in my application, I'll be storing perhaps many thousands of self-signed certificates and doing TOFU against them. At this scale, one can't prompt the user for every little change (such as a transition from TOFU to a CA-signed cert). It has to be carefully thought out to be reasonably secure while still manageable.

ecbftw avatar Dec 14 '19 20:12 ecbftw

@ecbftw the fundamental limitation is in Python's own ssl library. Last I checked, it didn't even allow for this.

sigmavirus24 avatar Dec 15 '19 00:12 sigmavirus24

@sigmavirus24 Right, that was true until Python 3.4 when the cadata argument was added. See my explanation in the first post. All supported versions of Python 3 now have this, which was not the case back when #474 was closed.

ecbftw avatar Dec 15 '19 00:12 ecbftw

cadata doesn't add any additional functionality to load_verify_locations(). It's a mechanism where certificates can be loaded into the SSLContext without requiring the filesystem.

I don't think the mechanism you're describing is possible without attempting multiple connections with two different SSLContext objects, one configured with CA certificates and one configured to not verify the chain of trust and to instead verify the signature of the peer certificate.

sethmlarson avatar Dec 16 '19 15:12 sethmlarson

Yeah, so the whole point here is that I find it really crazy that to provide a custom bundle, I have to write it to disk. I want to dynamically generate my bundle and pass it as a parameter. What I was suggesting is a file handle so I can just do StringIO while being backward compatible. cadata is just nice so urllib3 wouldn't have to write anything to disk either. As it stands now, I have no choice but to use something like NamedTemporaryFile.

If you set the X509_V_FLAG_PARTIAL_CHAIN verify flag, then you can present a CA bundle that includes truly self-signed certificates and it will verify fine. (What I mean by truly self-signed, is that it is a single server certificate that also signed itself.)

However, if you provide a bundle that includes a server certificate that is signed by a custom CA (and that custom CA isn't in your bundle), then this approach doesn't work. So in that case, you're right, you're not really able to do TOFU that way.

I'm finding that basically... OpenSSL kinda sucks at this. It just isn't flexible for CA management. I still think my suggestion is an improvement in flexibility and shouldn't be discounted, but to achieve what I want, my only current option is to use pyOpenSSL with a OpenSSL.SSL.Context() and call the set_verify() method to set a callback. Then do the certificate validation by hand. I think that works how I want it, but now I'm not sure how to use this SSL socket I created with Requests or urllib3. Any tips? Can I subclass a PoolManager or Requests adapter?

Thanks for your help.

ecbftw avatar Dec 19 '19 00:12 ecbftw

I have come up with a fully working solution that:

  • Forces urllib3 to use pyOpenSSL via the contrib module (which seems like a semi-legit thing)
  • Implements a pyOpenSSL set_verify callback method to validate certificates
  • The custom callback method is able to validate certificates based on a relaxed form of the traditional CA chain, and failing that, is able to perform TOFU validation on certificates
  • All validation happens during a single TLS handshake, not requiring reconnects

This is all wonderful for my end user, but it requires several monkey patches and isn't exactly elegant in places. One thing that would help tremendously is if there was a much easier way to override the pyOpenSSL callback method passed to set_verify. Any thoughts on this?

ecbftw avatar Jan 09 '20 03:01 ecbftw

This advice is exactly what I need. In my scenario, I need to continue to request a number of books since the visa service, certificate content is stored in the database and it can be changed, now I have to deposit before a request on the disk, in order not to affect performance, I will not every time to refresh the certificate contents of the disk, resulting in the possible certificate has expired.

barnettZQG avatar Apr 22 '20 12:04 barnettZQG

pyOpenSSL is deprecated and will be removed in future release version 2.x (https://github.com/urllib3/urllib3/issues/2691).

IvanLauLinTiong avatar Aug 11 '22 15:08 IvanLauLinTiong