requests
requests copied to clipboard
Consider using system trust stores by default in 3.0.0.
It's been raised repeatedly, mostly by people using Linux systems, that it's annoying that requests doesn't use the system trust store and instead uses the one that certifi ships. This is an understandable position. I have some personal attachment to the certifi approach, but the other side of that argument definitely has a reasonable position too. For this reason, I'd like us to look into whether we should use the system trust store by default, and make certifi's bundle a fallback option.
I have some caveats here:
- If we move to the system trust store, we must do so on all platforms: Linux must not be its own special snowflake.
- We must have broad-based support for Linux and Windows.
- We must be able to fall back to certifi cleanly.
Right now it seems like the best route to achieving this would be to use certitude. This currently has support for dynamically generating the cert bundle OpenSSL needs directly from the system keychain on OS X. If we added Linux and Windows support to that library, we may have the opportunity to switch to using certitude.
Given @kennethreitz's bundling policy, we probably cannot unconditionally switch to certitude, because certitude depends on cryptography (at least on OS X). However, certitude could take the current privileged position that certifi takes, or be a higher priority than certifi, as an optional dependency that is used if present on the system.
Thoughts? This is currently a RFC, so please comment if you have opinions. /cc @sigmavirus24 @alex @kennethreitz @dstufft @glyph @reaperhulk @morganfainberg
I think the system trust stores (or not) essentially boils down to whether you want requests to act the same across platforms, or whether you want it to act in line with the platform it is running on. I do not think that either of these options are wrong (or right), just different trade offs.
I think that it's not as simple on Windows as it is on Linux or OSX (although @tiran might have a better idea). I think that Windows doesn't ship with all of the certificates available and you have to do something (use WinHTTP?) to get it to download any additional certificates on demand. I think that means that a brand new Windows install, if you attempt to dump the certificate store will be missing a great many certificates.
On Linux, you still have the problem that there isn't one single set location for the certificate files, the best you can do is try to heuristically guess at where it might be. This gets better on Python 2.7.9+ and Python 3.4+ since you can use ssl.get_default_verify_paths() to get what the default paths are, but you can't rely on that unless you drop 2.6, 2.7.8, and 3.3. In pip we attempt to discover the location of the system trust store (just by looping over some common file locations) and if we can't find it we fall back to certifi, and one problem that has come up is that sometimes we'll find a location, but it's an old outdated copy that isn't being updated by anything. People then get really confused because it works in their browser, it works with requests, but it doesn't in pip.
I assume the fall back to certifi ensures that things will still work correctly on platforms that either don't ship certificates at all, or don't ship them by default and they aren't installed? If so, that's another possible niggle here that you'd want to think about. Some platforms, like say FreeBSD, don't ship them by default at all. So it's possible that people will have a requests using thing running just fine without the FreeBSD certificates installed, and they then install them (explicitly or implicitly) and suddenly they trust something different and the behavior of the program changes.
Anyways, the desire seems reasonable to me and, if all of the little niggles get worked out, it really just comes down to a question of if requests wants to fall on the side of "fitting in" with a particular platform, or if it wants to prefer cross platform uniformity.
I think the system trust stores (or not) essentially boils down to whether you want requests to act the same across platforms, or whether you want it to act in line with the platform it is running on. I do not think that either of these options are wrong (or right), just different trade offs.
I wouldn't say that either option is completely wrong, but I do think that using the platform trust store is significantly right-er, due to the availability of tooling to adjust trust roots on the platform and the relative unavailability of any such tooling for requests or certifi. If you go into Keychain Access to add an anchor (or the Windows equivalent) nothing about requests makes it seem like it would be special, that it would be using a different set of trust roots than you had already configured for everything else.
It depends if your audience are people who are familiar with a particular platform or not. I have no idea how to manage the trust store on Windows but I know how to manage the trust store for requests because requests currently chooses being internally consistent cross platform over being externally consistent with any particular platform.
IOW this change makes it easier for people who are familiar with the system the software is running on at the cost of people who are not.
Sent from my iPhone
On Jan 11, 2016, at 4:29 PM, Glyph [email protected] wrote:
I wouldn't say that either option is completely wrong, but I do think that using the platform trust store is significantly right-er
IOW this change makes it easier for people who are familiar with the system the software is running on at the cost of people who are not.
In the abstract, I disagree. Of course, we may have distribution or build toolchain issues which make people have to care about this fact, but if it works properly (pip installs without argument, doesn't require C compiler shenanigans for the end user) then what is the penalty to people who are not familiar with the platform?
It forces them to learn the differences of every platform they are running on.
Sent from my iPhone
On Jan 11, 2016, at 4:59 PM, Glyph [email protected] wrote:
what is the penalty to people who are not familiar with the platform?
So I am in favour of using the system store in general because in most cases [outside of dev] if you're relying on requests or a similar library you're going to expect it to work similar to the rest of the system/tooling. Asking someone to learn the tooling for the system they are deploying on is not unreasonable. If an application uses requests and needs to trust a specific cert (end-user story here), it is usual that it'll handle the installation for that platform at install time (aka OS X or Windows).
From a developer perspective, it becomes a little more difficult but still not insurmountable as long as we approach this in a way that the developer has clear methods to continue with the same behaviour as today.
As discussed in IRC, perhaps the easiest method to ensure sanity is to really finish up and polish certitude(? was this the tool discussed?) so we can encapsulate the platform/system-specifics clearly and try and ensure requests logic is the same on all of the platforms.
It forces them to learn the differences of every platform they are running on.
How so? By default, it ought to get the trust roots you expect on every platform. It's not like I need to learn new and exciting things when I launch Safari vs. IE just to type https://...
I think that it's not as simple on Windows as it is on Linux or OSX (although @tiran might have a better idea). I think that Windows doesn't ship with all of the certificates available and you have to do something (use WinHTTP?) to get it to download any additional certificates on demand. I think that means that a brand new Windows install, if you attempt to dump the certificate store will be missing a great many certificates.
You are right about this. I have verified it on my nearly-pristine Windows VM; in a Python prompt, I do:
>>> import wincertstore
>>> print(len(list(wincertstore.CertSystemStore("ROOT"))))
and get "21". Visit some HTTPS websites, up-arrow/enter in the python interpreter, and now I get "23".
There's some technical documentation here:
https://technet.microsoft.com/en-us/library/bb457160.aspx
And some more explanation here:
http://unmitigatedrisk.com/?p=259
Frustratingly, I can't find an API that just tells it to grab the certificate store; it seems that verifying a certificate chain that you don't have the root to is the only way that it adds certificates, and it adds them one at a time as necessary. It baffles me that Microsoft seems to consider storage for certificates a scarce resource.
After hours of scouring MSDN, I give up. Hopefully someone else can answer this question: https://stackoverflow.com/questions/34732586/is-there-an-api-to-pre-retrieve-the-list-of-trusted-root-certificates-on-windows
In my experience with pip, which attempts to discover the system store and if it can't find it falls back to a bundled copy, I have had to learn how the system store works on platforms that I have no intention on ever running. This is for a fairly simple method of detection (look for file systems) but it absolutely ends up that way. The simple fact is, in my experience most people have absolutely no idea how their system manages a trust store (and if it manages a trust store or not).
Sent from my iPhone
On Jan 11, 2016, at 5:22 PM, Glyph [email protected] wrote:
How so? By default, it ought to get the trust roots you expect on every platform. It's not like I need to learn new and exciting things when I launch Safari vs. IE just to type https://...
Look for files on the system*
FWIW, If I could prevent downstream redistributors from forcing pip to use the system store, I would revert the change to look in system locations immediately and only ever use a bundled copy. The UX of that tends to be so much nicer, the only reason we started to trust the system trust stores is because redistributors do patch pip to use the system trust store, so you end up in a situation where people get different trust stores based on where their copy of pip came from (which is likely also a concern for requests).
As an additional datapoint, if I remember correctly, the browsers which are not shipped with the OS tend to not use the system trust store either. According to Chrome's Root Certificate Policy they will use the system trust store on OSX and on Windows but they won't use it on Linux. I believe that even where Chrome does use the system store, they still layer their own trust ontop of that to allow them to blacklist certificates if need be. I assume this capability is in place because they do not wholly trust the OS trust stores to remove compromised certificates. If I recall correctly, Firefox does not use the system trust store at all on any OS.
Another question is what even is the "root trust store" on a Linux. The closest that you can get is wherever the system provided OpenSSL (assuming they even provide OpenSSL) is configured to point to. However AFAIK there is no way to determine if you're using a system provided OpenSSL or a copy that someone installed (perhaps via Anaconda?), the additional copy may have stale certificates or no certificates available. If it's stale certificates, then you've successfully lowered the security of requests user's by attempting to follow the system trust store. If it's an empty trust store, how do you determine the difference between "empty because I trust nothing" and "empty because my copy of OpenSSL isn't shipping them" or do I have to manage the certificates I trust using my OS, unless I want to trust nothing then I have to manage the certificates I trust using requests?
I've also found that talks about trying to use the platform certificate trust store on Linux (see here). I've not personally verified the information in this article, however it makes me feel very wary about trying to make using the system trust store anything but an exercise in frustration.
I think our current approach is the correct one, considering who Requests was built for.
That being said, it wouldn't hurt to add more documentation/functionality around using system certs for "advanced" users.
I think talking about configurability is maybe a red herring. It's useful – necessary even – in certain circumstances, but users who know they need that can usually figure it out.
The more significant issue is that trust root database updates, especially for end-user client software, are both infrequent and extremely important to do in a timely manner. certifi has no mechanism for automatically updating. Not only will it not come down in a system update, it can't even be done globally; you have to do it once per Python environment (virtualenv, install, home directory, etc, where it's installed).
@glyph I think you may have hit the nail on the head there. That is a significant reason (and addresses the other issues outlined) to use a more centralized location for the cert store.
The flip side is that you have trust stores like Debian which trusted CACert for a long time, and still trusts SPI even though neither of those have gone through any sort of real audit that you'd expect for a trusted CA.
The flip side is that you have trust stores like Debian which trusted CACert for a long time, and still trusts SPI even though neither of those have gone through any sort of real audit that you'd expect for a trusted CA.
As I understand it, Microsoft trusts their own CA, too, and Apple trusts theirs. SPI is just Debian's version of that, isn't it?
SPI isn't run by Debian, it's a third party organization similar to that of the Software Freedom Conservancy that Debian happens to be a member of. It'd be more like Microsoft and Apple trusting the CA of their datacenter just because they happened to use them as their data center. In addition, I'm pretty sure that Apple and Microsoft have both passed a WebTrust audit for their root CAs.
To be fair to Debian, I think the current plan is to stop using SPI certificates for their infrastructure and switch to more generally trusted certificates and then stop including SPI and switch to using just the Mozilla bundle without any additions.
That being said, is it even true that they are shipping updates to them? Looking at packages.debian.org for the ca-certificates packages it shows that the versions there are:
- squeeze (oldoldstable):
20090814+nmu3squeeze1 - wheezy (oldstable):
20130119+deb7u1 - jessie (stable):
20141019 - stretch (testing):
20160104 - sid (unstable):
20160104
I haven't looked at the actual contents of these packages, but the version numbers lead me to believe that they are not infact keeping the ca-certificates package up to date.
In addition to that, looking at the open bugs for ca-certificates there are bugs like #721976 which means that the ca-certificates store includes roots which are not valid for validating servers and are only valid for other topics (like email) which means you can't actually use the current ca-certificate package without massaging it to remove those certificates yourself.
Another issue #808600 has Comodo requesting a removal of a particular root that they no longer consider to be inscope for the CAB Forum's Baseline Requirements. That has been removed from testing and sid, but has not been removed from jessie, wheezy, or squeeze. The maintainers of ca-certificates claim they'll be requesting an upload to jessie and wheezy, but not to squeeze (which still has LTS suport).
That's just from spending a little bit of time looking at one, fairly popular, distribution. I imagine the long tail of the issues with the system provided bundle gets worse the further away from the popular distributions you get. It's not clear to me that it's a reasonable assumption that the certificates included in any random OS are going to be properly maintained.
The other elephant in the room is we're just assuming that because an update is available to their ca-certificates package that someone is going to have pulled it in. As far as I know, most (if not all?) of the Linux systems do not automatically update by default and require configuration to start to do so. There is likely to be a bigger chance of this with Docker in the picture. On the flip side, I think people generally try to update to the latest versions of their dependencies when working on a project.
Debian's cert bundle is almost certainly like ubuntu's, which does not update to the Mozilla bundle that removed 1024-bit roots in order to avoid the pain like that which hit certifi. All of the out-of-date cert bundles are OpenSSL pre 1.0.2, which means they cannot correctly build the cert chain to a cross-signed root without having the 1024-bit cross-signing root still present. I suspect that's the real concern there.
Maybe they should stop shipping an OpenSSL that can't correctly validate certificate chains.
I reached out to Kurt Roebx about backporting the fixes for that, he said it was a thing he was looking at doing, I have no clue what the timeline is.
On Wed, Jan 13, 2016 at 7:27 AM, Donald Stufft [email protected] wrote:
Maybe they should stop shipping an OpenSSL that can't correctly validate certificate chains.
— Reply to this email directly or view it on GitHub https://github.com/kennethreitz/requests/issues/2966#issuecomment-171276369 .
"I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero GPG Key fingerprint: 125F 5C67 DFE9 4084
So, what if we made the behaviour of looking at the system CA bundle an extra, e.g., pip install requests[system_ca] where it would ship without our certificate bundle and instead use certifi. This allows us to keep on "just work" ing for our default user base while supporting the people who need to use the system bundle.
Seems reasonable.
I would start with adding an API for it, before worrying about how to
enable it via installation, you need requests.get(url, verify=<something>) to exist first.
On Wed, Jan 13, 2016 at 7:41 AM, Ian Cordasco [email protected] wrote:
So, what if we made the behaviour of looking at the system CA bundle an extra, e.g., pip install requests[system_ca] where it would ship without our certificate bundle and instead use certifi. This allows us to keep on "just work" ing for our default user base while supporting the people who need to use the system bundle.
— Reply to this email directly or view it on GitHub https://github.com/kennethreitz/requests/issues/2966#issuecomment-171279745 .
"I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero GPG Key fingerprint: 125F 5C67 DFE9 4084
Debian / Ubuntu have more issues. They ignore CKT_NSS_MUST_VERIFY_TRUST flag and throw all trust anchors in one PEM file. The flag overrides extended key usage (EKU) for root certs, e.g. only trust a root cert for S/MIME but not for TLS server auth.
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=721976 https://bugs.launchpad.net/ubuntu/+source/ca-certificates/+bug/1207004
Apple, Microsoft and Red Hat/Fedora have means to disable certs for a purpose. Red Hat only puts trust anchors for TLS server auth in the default cert bundle. With FreeDesktop.org's p11kit and PKCS#11 bindings the policies can even be modified by user. This doesn't work for OpenSSL yet because OpenSSL uses a static PEM file instead of PKCS#11 to fetch trust anchors.
@glyph Does MS trust their own cert for all EKUs or just for some EKUs like code signing?
@alex would something like
import requests
from requests import certs
requests.get(url, verify=certs.system_bundle_where())
Be a sufficiently good API?
Note that that is already the API certitude provides.
SPI isn't run by Debian, it's a third party organization similar to that of the Software Freedom Conservancy that Debian happens to be a member of.
This is a bit of a tangent, but; no, not quite.
I'm pretty sure that Apple and Microsoft have both passed a WebTrust audit for their root CAs.
I checked and this is definitely true for Apple and seems to be true for Microsoft. So, yeah, SPI isn't cutting the mustard here.
SPI was originally created to allow the Debian Project to accept donations. It now acts as a fiscal sponsor to many free and open source projects.
originally is an important word here, it started as something ran by Debian, but it's now fully independent AFAIK.
To sum up the argument thus far, I think it's:
<glyph> it's the OS's job to update certificates
<dstufft> debian does a bad job
I don't think these statements really conflict. And I don't disagree, at all: Debian is being somewhat negligent in their treatment of root CAs, and that could compromise Debian users' security. Somebody should be yelling at them about that. But from a threat modeling perspective, if we assume that Debian is including a bad / compromised CA, that means the get-pip.py you got was subverted in transit anyway, because curl is using the system CA list too. I'd still rather have a bad CA cert list from my OS than 50 bad CA cert lists from applications, since the OS is a lot easier to update centrally. This applies even to Docker containers: rebuilding an image with an upgrade line at the top should get the latest of everything, including trust roots.
The other side of that is, if the people who should do the job did it and we were happy with that, we'd just let OpenSSL pick our default cipher strings too.
The other side of that is, if the people who should do the job did it and we were happy with that, we'd just let OpenSSL pick our default cipher strings too.
It's a fair cop.
First, a couple of points:
-
The SPI CA is no longer in
ca-certificatesas of version20151214. -
The single PEM CA bundle is only used by software that is unable to use a directory of certificates. (Although the issue of conflating different types of trust root is real)
I think the most interesting thing about the ca-certificates package is that it allows an administrator to enable/disable certificates if they so choose. Of course, an uninformed user isn't going to bother with this, but if you are interested / informed, and everything uses the system CA configuration, you have a central place to view / manage things; while if everything bypasses this and ships their own bundles, it's basically impossible to keep track of everything no matter what type of user you are.
(This is already an issue since the upstream versions of Firefox, Chrome, etc. all use their own certificate stores...)
I think true native support (i.e.: mapping "is this certificate okay?" requests to OS APIs) would be preferable. This would help with the problem of automatically-updating certificate stores on Windows, along with supporting enterprise certificate authorities. Here I had to manually import our root certificates into the certificate bundle, and if it gets reissued across the domain, we'd need to repeat the process. (Which could get lost over time)
Is that possible? Or is certificate verification ultimately handled by Python/urllib, and Requests merely supplies the bundle?
(This is already an issue since the upstream versions of Firefox, Chrome, etc. all use their own certificate stores...)
I think Firefox is the only one that has its own store, Chrome (at least on Windows) uses the system store.
I think true native support (i.e.: mapping "is this certificate okay?" requests to OS APIs) would be preferable.
That is very nearly impossible.
The issue there is that there are three major platforms (Windows, OS X, other), each of which ships with their own "OS-default" library for validating TLS certificate chains. For obvious reasons, these libraries all also believe they are in charge of doing the TLS record building and handshaking and all the other fun stuff. That means that integrating with those libraries requires writing three separate wrapper libraries, each of which may fail in exciting and different ways.
Getting those to function together correctly would be extremely difficult. Even if we could do that, I rolled all the "other" platforms together because they'd provide OpenSSL, but in fact many would either not provide that (e.g. BSD providing LibreSSL) or the user would prefer we used a different library (GnuTLS, PolarSSL, NSS).
It should be noted that, as far as I can tell, literally the only piece of web software that attempts to do this is curl: not even web browsers try. Firefox bundles its own TLS library (NSS), as does Chrome (BoringSSL), while Safari and Edge do not run across multiple platforms and so can stick their hooks into the OS-provided libraries. Even curl does not appear to hook into Windows' default store.
Currently the assumption is that any change here would simply adjust to use a PEM bundle provided by the OS. On *nix, this should be in one of a few well-known locations, and certitude could simply proxy to it. On OS X, we can extract the certificates from the keychain and write them into a PEM file: this is what certitude does today. On Windows we could in principle do this, but there's some trickiness about the fact that Windows dynamically fetches some root certificates (a truly mind-boggling design decision).
Chrome uses the platform trust stores, including their provided functions for checking validity.
On Mon, Jan 18, 2016 at 6:15 AM, Cory Benfield [email protected] wrote:
I think true native support (i.e.: mapping "is this certificate okay?" requests to OS APIs) would be preferable.
That is very nearly impossible.
The issue there is that there are three major platforms (Windows, OS X, other), each of which ships with their own "OS-default" library for validating TLS certificate chains. For obvious reasons, these libraries all also believe they are in charge of doing the TLS record building and handshaking and all the other fun stuff. That means that integrating with those libraries requires writing three separate wrapper libraries, each of which may fail in exciting and different ways.
Getting those to function together correctly would be extremely difficult. Even if we could do that, I rolled all the "other" platforms together because they'd provide OpenSSL, but in fact many would either not provide that (e.g. BSD providing LibreSSL) or the user would prefer we used a different library (GnuTLS, PolarSSL, NSS).
It should be noted that, as far as I can tell, literally the only piece of web software that attempts to do this is curl https://github.com/bagder/curl/tree/master/lib/vtls: not even web browsers try. Firefox bundles its own TLS library (NSS), as does Chrome (BoringSSL), while Safari and Edge do not run across multiple platforms and so can stick their hooks into the OS-provided libraries. Even curl does not appear to hook into Windows' default store.
Currently the assumption is that any change here would simply adjust to use a PEM bundle provided by the OS. On nix, this should be in one of a few well-known locations, and certitude could simply proxy to it. On OS X, we can extract the certificates from the keychain and write them into a PEM file: this is what certitude does today. On Windows we could *in principle do this, but there's some trickiness about the fact that Windows dynamically fetches some root certificates (a truly mind-boggling design decision).
— Reply to this email directly or view it on GitHub https://github.com/kennethreitz/requests/issues/2966#issuecomment-172500959 .
"I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero GPG Key fingerprint: 125F 5C67 DFE9 4084
According to https://www.chromium.org/Home/chromium-security/root-ca-policy Chrome only uses the platform trust store on Windows and OSX.
Because an issue in pip made me think about this again!
On *nix, this should be in one of a few well-known locations, and certitude could simply proxy to it.
I think this is wrong. It's the method currently in use by pip but we're moving away from it because just because a file exists in one of the "well known locations" doesn't mean that it's the right file for that system and assuming it is opens people up to using a stale CA bundle that isn't being managed by anything. If you're going to do anything, it should be to utilize the OpenSSL APIs to get the default OpenSSL ca bundle.
That brings me to the next point in this, currently the OpenSSL APIs to get the default CA bundle are broken on Debian. Attempting to query it gives you a location that doesn't exist because it is the wrong location. This is #805646 which results in (on Debian Stable):
$ python -c "import ssl; print(ssl.get_default_verify_paths())"
DefaultVerifyPaths(cafile=None, capath='/usr/lib/ssl/certs', openssl_cafile_env='SSL_CERT_FILE', openssl_cafile='/usr/lib/ssl/cert.pem', openssl_capath_env='SSL_CERT_DIR', openssl_capath='/usr/lib/ssl/certs')
$ python3 -c "import ssl; print(ssl.get_default_verify_paths())"
DefaultVerifyPaths(cafile=None, capath='/usr/lib/ssl/certs', openssl_cafile_env='SSL_CERT_FILE', openssl_cafile='/usr/lib/ssl/cert.pem', openssl_capath_env='SSL_CERT_DIR', openssl_capath='/usr/lib/ssl/certs')
The openssl_cafile should be /etc/ssl/certs/ca-certificates.crt (I think). I am unsure what the full impact of this actually is in terms of what versions of Debian and if it extends to any of the Debian derivatives like Ubuntu.
Echoing the Chrome policy that @dstufft referenced above, I think that maybe the right thing to do here is (for now) to trust the Windows and OS X trust stores (inasmuch as we can) since Microsoft and Apple have demonstrated a baseline level of trustworthiness in their management of certificates.
It seems like neither Debian nor Red Hat is similarly responsible, at least, at this time.
@dstufft asks: what is different about trusting the OS's cert stores than trusting the OS to properly configure OpenSSL? In the Microsoft and Apple case, they knew better than to ship OpenSSL, because it's not actually a proper transport security solution: it's part of a construction kit for building your own transport security. They built their own, more complete TLS implementations, which have interfaces that let you do things like select trust roots in a sensible way. In a perfect world, IMHO, we'd be using cryptography.tls, which would actually back-end to SChannel or Secure Transport on those platforms, and bundle its own OpenSSL only for linux. Given that we're a long way away from that world (just in terms of the amount of work it would take), bundling OpenSSL for the protocol implementation on all platforms but trusting the platform trust store on those platforms which seem trustworthy, which hopefully eventually will be all of them.
@glyph well, except AFAIK Windows doesn't provide a way to either enumerate the SSL Certificates in a way that you actually get them all, not just the ones you've seen. Perhaps you an use SChannel for that (is that a thing you can do? last time I saw this someone was trying to replace urllib3 with WinHTTP to get requests to trust the windows trust store), but then if you're relying on SChannel you're also relying on Microsoft for some other things which might be undesirable. It'd mean that there's no more SNI for Windows XP users, no ECDHE+AESGCM for anyone not on Windows 10,
As far as I know, the only way to get ideal security right now on any platform in a way that isn't tied to the absolute latest versions of that platform is to bring your own TLS library and your own certificates.
Honestly, the single greatest thing that could probably be done to increase security is for cryptography to start shipping statically compiled wheels on Linux too so you can get a modern TLS implementation on all the platforms.
I think the best solution is to use each platform's native TLS implementation. On Windows, that's SChannel; on Mac, that's SecureTransport; on Fedora, that appears to be NSS; elsewhere, it's probably OpenSSL. That would make the host platform responsible for certificate verification, including dynamic downloading of new CA certs where necessary (on Windows). Also, on platforms where OpenSSL is not available, let alone the native solution (Windows, Android via the small set of libraries exported by the NDK), it would be great if application developers didn't have to bundle their own copy of a crypto/TLS library and keep it up to date.
Of course, implementing an abstraction over all of these libraries is a lot of work. libcurl has already done this in its vtls module. But that's currently internal to libcurl, and I assume that modifying requests to use libcurl instead of urllib3 is a bigger change than anyone here wants to make. However, the curl developers previously did some work on exporting vtls as a separate library; maybe we could convince them to revive that project.
It's bad enough that we have to use OpenSSL (or the platform SSL bindings); cURL is written in C, and like all software written in C, we periodically discover, much to our surprise, that it contains remote code execution and crashing vulnerabilities. So personally I'd rather avoid it here.
Ok, so let's think about this a different way slightly. Right now the discussion has focused on "where do we get the certificates", but that's not really a sufficiently detailed question because it just doesn't work on Windows.
So let's rephrase the question. Should be be using OpenSSL to verify the certificate on Windows/OS X? With PyOpenSSL, at the very least, it should be possible for us to do what Chrome does and intervene on the verify_callback and pass that certificate to the OS-appropriate verification functions.
Chrome has Windows-specific code for this and OS X-specific code for this. It is in principle possible to pull this functionality into cryptography and then, on Mac and Windows, essentially throw the verification out to the OS-specific functions rather than using OpenSSL.
My reservation about this approach is that the Python standard library does not make this approach available to us. However, we could potentially have the following approach:
- Check if cryptography and PyOpenSSL are installed. If they are, and the user is on Windows or OS X, use the system cert bundle as exposed by the system's own APIs.
- (Maybe, maybe not) Otherwise, if certitude is installed, use that cert bundle.
- Otherwise, if certifi is installed, use that cert bundle.
- Otherwise, use the vendored bundle.
Thoughts?
Just to further hammer home my point:
- Homebrew on OSX ships with a broken CAPath because it's empty but the directory exists.
- Debian ships with a broken CAFile which doesn't exist.
- CentOS and Fedora ship with a broken CAPath which is just some symlinks and doesn't work.
Pip is giving up on trying to trust the system stores, systems can't be trusted to ship things in a state that isn't broken. See https://github.com/pypa/pip/issues/3415.
Just to further hammer home my point:
- "Homebrew on OS X" is not a "system store"; it is an attempt to export the dynamic, user-centric system store to a static, globally-applicable store.
- Debian and CentOS are sort of broken
- The Pip issue you linked is also on a Linux system.
All of which seems consistent with the idea that Linux is broken (at least for now) but other platforms work OK (if we can figure out how to make them happen correctly, although that is an open issue for Windows).
It sounds like maybe Certitude should take this same direction, and depend on Certifi for Linux?
Debian and CentOS aren't broken. On both systems the configuration works fine with SSL_CTX_set_default_verify_paths(). Only one of both CAfile and CApath must work for SSL_CTX_set_default_verify_paths(). The patch tried to outsmart the system and work around the default API. Also requests can't handle CAfile and CApath at the same time. The combination caused the bug.
I understand @dstufft motivation for the heuristic. OpenSSL doesn't report if SSL_CTX_set_default_verify_paths() has loaded any certs or has found a CApath with valid certs. For CAfile SSLContext.get_ca_certs() helps but for CApath you are lost.
certifi isn't an option. At best it's a matter of last resort for pip.
An API exists, and on those platforms it returns a wrong or invalid value, that's pretty solidly broken in my book. In Python, if sys.stdout.write() on these platforms didn't actually write to sys.stdout we wouldn't be saying "Python isn't broken, you can still use print()", so why is this any different?
The difference is that the old approach wasn't using the defined API to load the default location. That's SSL_CTX_set_default_verify_paths(). Instead pip tried to mimic SSL_CTX_set_default_verify_paths() but only did half of the job.
@glyph Certitude can get this right by ignoring what the distro OpenSSL tells us and doing what curl does instead, which is to hardcode the list of paths and walk down them, one-by-one, until we find the bundle.
Credit to @bagder for that approach, especially as I'm just going to steal it.
:+1:
Isn't this problem already solved in Python 2.7.10+? urllib2 can verify certificates out-of-the-box. Why can't requests use the same solution?
@shypike Let's clear some things up. =)
Firstly, requests can and does verify certificates out of the box, and has been doing so for years: much longer than the standard library, and much more effectively. The question is not "should we verify certificates", it's "how should we verify certificates".
Secondly, no, this problem is not solved in Python 2.7.10+. In those versions of Python, the standard library modules use the SSLContext object returned from create_default_context. That function calls load_default_certs.
Unfortunately, it turns out that load_default_certs is a relatively deficient method. It suffers from some flaws.
Firstly, it only handles two cases: Windows, and all other platforms. On Windows it uses the enum_certificates method, which ends up calling into CertEnumCertificatesInStore. It grabs those certificates and passes them into OpenSSL to do the verification. This has two problems: firstly, it means that the certificate verification logic is different between Python and, for example, Microsoft Edge, because a different library is used to construct the certificate chain and validate the certificates. Secondly, not all root certificates will be available to Python when this method is called, because SChannel occasionally dynamically fetches root or intermediate certificates on-demand. This leads to unexpected, tricky behaviour.
One of the "other platforms" that Python ignores is Mac OS X. On this platform, everything is sad again, because Mac OS X has a complex keychain system that allows substantial editing of trust. However, OpenSSL pays absolutely no attention to this distinction, which means that a user will be in one of three cases:
- They'll be using the built-in OpenSSL, OpenSSL 0.9.8zg. This version calls into the OS X cert store to do the validation, so it's the most likely to get the right answer, but it's also unsupported and by-default insecure and should never be used.
- They'll be using an OpenSSL shipped with Python. If they do this, it's not clear where that OpenSSL will get its certificates from: probably nowhere.
- They'll be using an OpenSSL installed from Homebrew or somewhere equivalent. These places usually copy all the certificates out of the keychain at the time of install, but then do no refreshing of the cert store. This means that if OS X subsequently pushes out an update removing one of the root certificates from the store, this change will not propagate to the verifier, which will continue to trust the cert.
Additionally, as in the Windows case, OS X's Security.framework has a very complex and different validation logic than OpenSSL has. For example, users can mark certificates as trusted for only specific websites, or trusted only for specific uses: OpenSSL will not respect those distinctions.
As a result, the only platforms on which this is "solved" by the standard library are ones that ship OpenSSL by default, compile it such that load_default_verify_locations points to the right places, and that appropriately configure their trust stores. That amounts to meaning some Linuxes, and some other Unix-based OSes. That is officially not good enough.
Right now requests is doing better than the standard library because it carefully polices trust on all platforms, not just the ones the standard library deigns to support. This issue is about whether we can do even better by allowing the platform-specific logic to take over on Windows/OS X, and then as a result transitioning to use the default certificate stores on all platforms.
To be clear: we will never take a decision here that leads to a preference for Unix-like operating systems. That kind of parochial thinking is what leads to Windows users feeling like second-class citizens in the Python community. My current thinking here is that I'd like to emulate Chrome: use OpenSSL to perform the actual TLS, but have the platform-native libraries do the validation if possible. That gets us the closest to "native" behaviour on the platform in question.
Unfortunately, it turns out that load_default_certs is a relatively deficient method. It suffers from some flaws.
In the case of Linux, couldn't we rely on distros patching Python 2.7.10 to correctly find the system trust stores? I'd imagine that especially Debian people have no scruples doing so.
@untitaker Yes, we can, and so I'm not worried about Linux. Linux is the easy case here. It's everything else that is tricky.
@Lukasa Thanks for your explanation. SSL and certificates are even more of a minefield than I already thought. My "simple" problem is that requests rejects quite a few valid certificates, which are accepted by Python-urllib2, Firefox, IE and Chrome. Installing certifi doesn't resolve this issue.
@shypike My psychic powers tell me that your OpenSSL is pretty old. Take a read through of the discussion on certifi/python-certifi#26.
@Lukasa Maybe. Using 2.7.11 on Windows and forced to use eGenix.com's pyopenssl binaries (because those are the only ones that don't crash on me). They are at OpenSSL level 1.0.1q Even so, I think further discussion of my specific problem doesn't contribute much to the larger issue being discussed here. Thanks for your time.
On Twitter, @Lukasa responded to my suggestion that requests might punt on this problem by using libcurl as a transport, presumably via pycurl. @Lukasa's concern is that depending on libcurl and pycurl would make requests more difficult to install. Using libcurl is probably off the table anyway; @glyph was right to point out the inherent security risk in such a complex C library. Since @Lukasa pointed out that pycurl is only distributed as source and as a .exe installer for Windows, and not as a wheel for Windows and OS X, I think we can eliminate it from discussion.
I like @Lukasa's idea of using the native APIs on Windows and OS X to do the certificate verification, and configuring OpenSSL to use those APIs through a callback. Does pyOpenSSL support this yet? Of course, to use the appropriate APIs on Windows and OS X, it would be necessary to use ctypes, cffi, or a C extension module. Given that the latter is problematic with PyPy, we can probably eliminate that option. pyOpenSSL already uses cffi (indirectly, via cryptography), so that's probably the best option. So would we create bindings for the certificate verification APIs of Windows and OS X as separate Python packages, and add those packages to requests' extras_require list?
@mwcampbell PyOpenSSL does not support this, and never will, because PyOpenSSL is a thin wrapper library around OpenSSL, and so doesn't support the relevant APIs.
However, I recently got merged into cryptography the relevant bindings for OS X (pyca/cryptography#2683), and I believe that a cryptography with that change in it has been released now. I've also used those bindings to successfully use OS X to validate a certificate chain. A similar approach can probably be used on Windows and the cryptography developers have expressed a willingness to bind the appropriate functions. Given that PyOpenSSL depends on cryptography, this is far and away the simplest route.
My current proposal is to add the relevant functionality into the urllib3 PyOpenSSL shim. I've briefly discussed this with @shazow, who was open to the idea. Then, urllib3 would allow urllib3.contrib.pyopenssl.inject_into_urllib3() to take a parameter (system_trust=True, defaulting to False) that will automatically use the system trust store instead of OpenSSL on the relevant platforms.
The question then would become whether the requests project can come to consensus on setting that parameter to True. =) We should burn that bridge when we get to it: for now, I'd like to get the building blocks in place.
In an ideal world we'd actually pull the OS-specific logic out into its own library, so that it can be meaningfully tested, but that's a pretty tricky goal. On the other hand, if we can pull it off then we have both provided a really useful service to the Python community in general and potentially helped move towards #2118 by providing a PyOpenSSL SSLContext equivalent. This I think would be my preferred outcome.
My main concern with this discussion is the potential impacts proposed solutions could have on the user experience. So, to be clear, here are some guidelines:
Guidelines
- Installation of Requests cannot require any compilation, and should require no external dependencies.
- The porcelain API should not need to be changed in any way to accommodate these changes, except to provide new functionality.
Accommodating SSL is often where HTTP clients fall apart, and the user experience gets compromised in the name of security. Requests' current solution is an excellent one, though not perfect. With it, we were able to bring seamless proper TLS verification to countless developers that would not have used it otherwise.
Any suggested improvements to this approach need to have these design goals in mind.
I understand why installation should require no compilation, though I would limit that to "no compilation on Windows", since I think requiring compilation on Mac and Linux is entirely reasonable. But why no required external dependencies at all? pip does a great job of handling external dependencies, and all Python users should be using pip now. So, for example, I think it would be fine to move pyOpenSSL from extras_require to install_requires.
@mwcampbell because that is a requirement of this library. C compilation is rarely a seamless experience and is the #1 source of end-user confusion/frustration when it comes to package installation. It would come up very often, especially for a library as popular as Requests. You shouldn't need to compile C code to make HTTP Requests properly.
No dependencies is simply a design decision. One that could be rethought if needed, but again, it's about providing a simple and carefully crafted user experience, beginning to end. Of course, Pip works extremely well, and dependencies are very standard fare, but, Requests aims to be better than the status quo. You shouldn't need 5 dependencies to make HTTP requests — you should need 1. :)
@mwcampbell External dependencies could result in cases of version conflicts. One application could require pyOpenSSL in a specific version, while requests needs another. In some cases, (mostly enterprise) you're locked to older versions due to dependent code. Reworking that code for a newer version could be tedious or impossible, depending on what else depends on that version. (Especially where OpenSSL is involved)
To be clear, I agree with Kenneth. I doubt this will change the default, but I would like to make it possible to use requests with the system trust stores when it is possible to do so. PyOpenSSL makes it possible, so I'm inclined to use that as leverage.
I also agree with @kennethreitz. This is a laudable goal for any library, but given that requests is the transport for how you get other libraries, it's particularly important here. However, "no compilation" does not need to mean "no C code" any more, with the advent of the https://github.com/manylinux organization, and the possibility of allowing of linux wheels on PyPI.
Compilation on OS X requires installation of the Xcode command line tools, which requires interactive user acceptance of an EULA, and the error messages that users receive when these tools are not present are totally inscrutable. Part of the problem here rests with pip, of course; it should determine if a package needs compilation and just straight up tell you "run xcode-select --install" and not barf out a traceback about a missing executable.
For the record, no compilation does mean no C code if requests is going to continue to be used by pip. We cannot have any mandatory code that is not pure Python and is not 2.x / 3.x single source. Binary wheels are not acceptable for us. Of course requests could choose to go that way, but if it's mandatory then we'll have to figure out a different solution in pip.
I looked into how Chrome integrates with Windows' SSL/TLS certification validation and all it takes is a few simple API calls. I have a proof-of-concept in C that can be translated to a few ctypes calls. Where does trust verification take place now? Is it strictly in native code, or can it be overridden in Python? (Without monkeypatching around socket calls)
@smiley - If you have extracted this information from Chrome, perhaps you could comment on my Stack Overflow question, for posterity?
@smiley With OS X I plan to hook into the urllib3.contrib.pyopenssl module and interfere with ssl_wrap_socket. My PoC just waited until after the handshake to do the verification, so you can do that too and then provide us with the PoC code. In the proper code it would be better to hook into the verify callback so that we can fail the handshake, but that is a bit trickier.
I still think that this is the right thing to do, but I also think that the technology simply doesn't exist yet to facilitate it right now. Here is where I officially gave up (again: for now): https://twistedmatrix.com/trac/ticket/8201
@glyph an excellent decision :) (although, an unfortunate one)
Maybe we should just invent a standard (env var or something?) and make OS vendors comply to it if they want to support their own developers.
See shazow/urllib3#802.
Finally got around to doing the Windows implementation, and it looks like the configurable callback in PyOpenSSL's Context.set_verify(mode, callback) will be a good place to implement this. (as @Lukasa suggested) The only downside is that this will force Windows users to install ndg-httpsclient, but a simple pip install ndg-httpsclient worked perfectly, so that might not be so bad.
However, I noticed the PR in urllib3 and certitude. Should I implement it there somehow, or directly in Requests?
EDIT: I just noticed the native part of Certitude. Guess there's already an implementation. Is this still needed or just waiting on urllib3 to merge?
@smiley It's waiting on a few other things. The native part of certitude needs a setup for wheel builders, which I haven't gotten around to providing yet.
@Lukasa It's on my list! (But you should remind me on occasion)
The native part of certitude
What's the point in having native part which requires compilation on install, when it can be rewritten using ctypes (similarly as wincertstore) and therefore not require compilation and any dependency (apart from stdlib)?
@piotrjurkiewicz It won't require compilation on install, we'll distribute wheels. =)
And generally speaking ctypes doesn't behave well on PyPy (it's very slow), and PyPy is a first-class supported platform for requests. =)
Current Thoughts
I still think this should be disastrous, and everything in me says no. But I'm confident that if this does happen, it will be because it is implemented perfectly and seamlessly.
Landing in 4.0 vs. 3.0
If this does happen, it may be best for this to be the entire premise of a 4.0 release (3.0 need not wait for this, and already contains a large number of changes — smaller changesets are desirable).
Official Beta
I declare that this change will require an official beta release [for the first (and hopefully last) time] with an official call for feedback. This may be bypassed if we are 100% confident of zero complications for users, but if our confidence level is at 95%, a beta release with a call for testing/feedback is mandatory. As usual, we can carefully oscillate around the terminating line between confidence and arrogance while assessing this.
✨ 🍰 ✨
I still think this should be disastrous, and everything in me says no.
This issue has been really illuminating for me. But what do you mean by "disastrous"? It sounds like you think the implementation will just fail to work somehow?
@glyph
This issue has been really illuminating for me. But what do you mean by "disastrous"? It sounds like you think the implementation will just fail to work somehow?
I just don't like anything about it, and it raises every red flag in my book. It will take a currently very simple implementation of very reliable behavior, and replace it with very a complicated implementation (perhaps just complex — I personally avoid this layer at all costs, hence the simplicity of the current implementation, so this all seems quite complicated to me) with behavior that I fear may be far less reliable, in the name of security.
So ensuring the same level of reliability is my number one concern. Zero user-facing changes to installation behaviors are my second concern (NO compilation acceptable, and external dependencies should also be absent). My third concern, which is a large one, is that Requests will now behave differently on different systems — avoiding this was an explicit design decision when I designed it.
So, my reservations are large, and many. But, I'm open to it, tentatively. Everyone else cares about security far more than I do. I want to support that, but not at any expense to the above mentioned aspects of this project.
NO compilation acceptable
I think this more or less kills the idea as a mandatory thing. Maybe an opt-in thing.
@dstufft if that's true, perhaps we can rig it up similar to the current auto-use of PyOpenSSL, if available.
Taken further, could be a package like requests-systemcerts, included in requests[security].
Folks, pkg_resources solves this problem and it's been around forever.
# requests/certs.py
import os.path
import pkg_resources
for entry_point in pkg_resources.iter_entry_points(
group='requests.ca_bundle', name=None):
where = entry_point.load()
break
else:
try:
from certifi import where
except ImportError:
def where():
"""Return the preferred certificate bundle."""
# vendored bundle inside Requests
return os.path.join(os.path.dirname(__file__), 'cacert.pem')
if __name__ == '__main__':
print(where())
This resolves every concern.
- Anyone who wants to extend the CA bundle loading behavior simply implements a plugin.
- No new dependencies, no compiling. It's on me to build/distribute my plugin.
- It gracefully falls back to its current behavior. No breaking changes.
- It works when it's vendored (as it is in pip).
- People like me can override the built-in behavior without resorting to modifying the source and shipping a custom package.
- Distro maintainers could ship a plugin instead of modifying requests like they do today.
@dstufft how does this sound to you?
FWIW I'd be happy for this to live in urllib3.
It doesn't bother me, but it'd be a non-optional dependency on setuptools for requests, unless you also handled the "doesn't have setuptools case" as a fall back to the current behavior.
Requests already assumes setuptools is available.
https://github.com/kennethreitz/requests/blob/master/setup.py#L9
Only at build time.
@tmehlinger build-time/install-time dependency is not a actual runtime dependency
D'oh, you got me there
My concern with using pkg_resources/entry-points is that someone can very easily create a malicious plugin with a set of roots that will allow them to MITM your connection.
Totally valid concern. My use case is to support trusting an internal CA without having to jump through hoops. I hadn't considered that some jerk could upload a malicious package to PyPI and wreak havoc.