asgiref icon indicating copy to clipboard operation
asgiref copied to clipboard

Specify access to transport information

Open hashbrowncipher opened this issue 4 years ago • 21 comments

Use case: I would like to access the peer certificate sent by a TLS client within my ASGI app.

Issue: As far as I can tell, no information about the transport (including TLS session information) is available to the App.

In a fork of uvicorn, I have made a patch which exposes the full transport object as part of the scope passed to the app.

diff --git a/uvicorn/protocols/http/h11_impl.py b/uvicorn/protocols/http/h11_impl.py
index 240cb35..cf5fd67 100644
--- a/uvicorn/protocols/http/h11_impl.py
+++ b/uvicorn/protocols/http/h11_impl.py
@@ -191,6 +191,7 @@ class H11Protocol(asyncio.Protocol):
                     "raw_path": raw_path,
                     "query_string": query_string,
                     "headers": self.headers,
+                    "transport": self.transport,
                 }

Within my app code, I use scope['transport'].get_extra_info("ssl_object").getpeercert(binary_form=True) to access the relevant information.

Feature request: Specify a way for applications to fetch information about the transport they are communicating over.

See also: https://github.com/encode/uvicorn/issues/400

hashbrowncipher avatar Aug 08 '19 21:08 hashbrowncipher

Hm, one problem with this is that the ASGI events are deliberately limited to a JSON-plus-bytes-type set of potential values; passing a full object like this is sort of difficult to specify.

Is there any prior art in WSGI apps for doing similar things? I totally see the use case for information from client certificates especially, but I'm wary of specifying it rather than leaving it to servers to experiment with and decide on what would be best.

andrewgodwin avatar Aug 08 '19 23:08 andrewgodwin

It looks like uwsgi provides a nonstandard HTTPS_CLIENT_CERTIFICATE, with the raw bytes of the client cert. I was unable to find anything in gunicorn implementing client certificates, except for this gist where someone implemented a custom worker to access information from the client cert. So it looks like the Python ecosystem doesn't yet have much prior art with regards to accessing the TLS session information.

I'm certainly not wedded to the very wide interface suggested by my sample patch. I suggest taking a look at specifying access to some subset of BaseTransport.get_extra_info(), and moving from there.

hashbrowncipher avatar Aug 09 '19 02:08 hashbrowncipher

Right, I think some level of that information would be a useful optional extra to the scope for servers to provide, though like with the raw path info, it's impossible to provide in all cases (e.g. where a loadbalancer is terminating SSL).

I'm not particularly deep in this one, so while I like the idea, I'd rather someone else comes up with the proposed set of things to include, the reasoning about why they are, and then the PR to add it to the spec.

andrewgodwin avatar Aug 09 '19 23:08 andrewgodwin

My two cents on this would be "let's define the exact variables needed for client certificates". Passing a raw handle to a transport isn't going to be portable across different HTTP versions

jlaine avatar Aug 30 '19 11:08 jlaine

Apache's mod_ssl has a way to do this, which some apps will be using already. Please consider reusing some of this interface.

See the "Environment Variables" section of:

https://httpd.apache.org/docs/2.4/mod/mod_ssl.html

In particular:

  • SSL_CLIENT_CERT_CHAIN_0 - the client certificate
  • SSL_CLIENT_CERT_CHAIN_<n> - PEM-encoded certificates in client certificate chain that was sent by the client. <n> is a positive nonzero integer. SSL_CLIENT_CERT_CHAIN_<n+1> is the certificate that signed SSL_CLIENT_CERT_CHAIN_<n>. The last entry in the chain should be signed by a known CA root.
  • SSL_CLIENT_VERIFY - Client cert verification status:
    • "NONE" if there is no client certificate
    • "SUCCESS" if client certificate validation succeeded
    • "GENEROUS" ??? Not sure when Apache uses this; probably not needed ???
    • "FAILED:<reason>" if client certificate validation failed and the webserver is configured to pass the request to ASGI anyway; where <reason> is the reason for the verification failure.
  • SSL_SERVER_CERT - PEM-encoded server certificate

Also worth considering, although perhaps not using directly:

  • SSL_PROTOCOL - The SSL protocol version (SSLv3, TLSv1, TLSv1.1, TLSv1.2) (Alternatively, would probably be better if ASGI passed this as a numeric TLS version number)
  • SSL_CIPHER / SSL_CIPHER_USEKEYSIZE - The cipher being used (not sure if this is the best format - would probably be better if ASGI specified the numeric TLS identifier for the cipher suite instead).

The many other fields that Apache support are mainly extracting certain fields from the certificates. A Python app can just parse the PEM-encoded certificates to get those fields, so I don't think they add enough value to justify the cost of supporting them in ASGI servers and middleware.

Jon-Work avatar Feb 25 '20 15:02 Jon-Work

Here's a proposal, for TLS extensions to the connection scope for both HTTP and Websockets (and, potentially, for other protocols that use TLS):

  • tls_used (bool) -- True if this connection is over TLS, false otherwise. Optional; defaults to false.

  • tls_client_cert_chain (Iterable[Unicode string]) -- An iterable of Unicode strings, where each string is a PEM-encoded x509 certificate. The first certificate is the client certificate. Any subsequent certificates are part of the certificate chain sent by the client, with each certificate signing the preceeding one. Only applicable if the connection was over TLS; for non-TLS connections or if the client did not provide a client certificate then it will be an empty iterable. Some web server implementations may be unable to provide this (e.g. if TLS is terminated by a separate proxy or load balancer). Optional; defaults to empty list.

  • tls_server_cert (Optional[Unicode string]) -- The PEM-encoded x509 certificate sent by the server when establishing the TLS connection. Only applicable if the connection was over TLS; for non-TLS connections then it will be None. Some web server implementations may be unable to provide this (e.g. if TLS is terminated by a separate proxy or load balancer). Optional; defaults to None.

  • tls_client_cert_error (Optional[Unicode string]) -- None if a client certificate was provided and successfully verified, or was not provided. If a client certificate was provided but verification failed, this is a non-empty string containing an error message or error code indicating why validation failed; the details are web server specific. Most web server implementations will reject the connection if the client certificate verification failed, instead of setting this value. However, some may be configured to allow the connection anyway. This is especially useful when testing that client certificates are supported properly by the client - it allows a response containing an error message that can be presented to a human, instead of just refusing the connection. Optional; defaults to None.

  • tls_version (Optional[int]) -- The TLS version in use. This uses the version numbers as defined in the TLS specifications, which is an unsigned integer. Common values include 0x0303 for TLS 1.2 or 0x0304 for TLS 1.3. If TLS is not in use, set to None. Some web server implementations may be unable to provide this (e.g. if TLS is terminated by a separate proxy or load balancer); in that case set to None. Optional; defaults to None.

  • tls_cipher_suite (Optional[Iterable[int, int]]) -- The TLS cipher suite that is being used. This is a pair of unsigned integers specified in the relevant RFC, for example [0x13, 0x01] for TLS_AES_128_GCM_SHA256. If TLS is not in use, set to None. Some web server implementations may be unable to provide this (e.g. if TLS is terminated by a separate proxy or load balancer); in that case set to None. Optional; defaults to None.

Jon-Work avatar Feb 25 '20 16:02 Jon-Work

@Jon-Work I like what you've got here but instead of a PEM encoded certs (or a binary value as proposed in the issue) it seems that using a dictionary similar to what is returned by getpeercert(). This would eliminate the need to decode the certs to get the necessary info. Here's a (lightly edited) example I pulled from SSLObject.getpeercert:

{
'subject': (
  (('domainComponent', 'com'),), 
  (('domainComponent', 'foobar'),),
  (('organizationalUnitName', 'Users'),), 
  (('commonName', 'jwatson'),), 
  (('emailAddress', '[email protected]'),)
), 
'issuer': (
  (('domainComponent', 'com'),), 
  (('domainComponent', 'foobar'),), 
  (('commonName', 'FB-SUB-CA'),)
), 
'version': 3, 
'serialNumber': 'ABCDEF0123456789', 
'notBefore': 'Mar  2 15:54:35 2020 GMT', 
'notAfter': 'Mar  2 15:54:35 2021 GMT', 
'subjectAltName': (
  ('othername', '<unsupported>'), 
  ('email', '[email protected]')
),
'OCSP': ('http://ca.foobar.com/ocsp',), 
'caIssuers': ('http://ca.foobar.com/FB-SUB-CA.crt',), 
'crlDistributionPoints': ('http://ca.foobar.com/crl/FB-CA.crl',)
}

I could be missing something but some of that looks messy to me e.g. unnecessary nested tuples in the subject and issuer sections. So something like this perhaps?

{
'subject': (
  ('domainComponent', 'com'),
  ('domainComponent', 'foobar'),
  ('organizationalUnitName', 'Users'), 
  ('commonName', 'jwatson'), 
  ('emailAddress', '[email protected]')
), 
'issuer': (
  ('domainComponent', 'com'), 
  ('domainComponent', 'foobar'), 
  ('commonName', 'FB-SUB-CA')
), 
'version': 3, 
'serialNumber': 'ABCDEF0123456789', 
'notBefore': 'Mar  2 15:54:35 2020 GMT', 
'notAfter': 'Mar  2 15:54:35 2021 GMT', 
'subjectAltName': (
  ('othername', '<unsupported>'), 
  ('email', '[email protected]')
),
'OCSP': ('http://ca.foobar.com/ocsp'), 
'caIssuers': ('http://ca.foobar.com/FB-SUB-CA.crt'), 
'crlDistributionPoints': ('http://ca.foobar.com/crl/FB-CA.crl')
}

Upon looking at the docs for getpeercert() and it seems that the extra level of tuples is 'as designed'. I don't get why it's necessary but perhaps it should just be left as-is.

James-Matthew-Watson avatar Apr 15 '20 19:04 James-Matthew-Watson

Hi @James-Matthew-Watson.

Thanks for your suggestions.

My concerns about having just a pre-parsed representation are:

  1. The format you've proposed doesn't include all the fields from the certificate. It might be OK for your use cases, but it's not sufficient for other use-cases. (E.g. I'm testing that the TLS client certificate complies with various standards, so my tests need access to pretty much all fields of the certificate).

  2. What happens when a newly-written specification adds new features to the certificate? By definition, none of the existing parsers will handle it, and there may be a long delay until the data format spec and all existing implementations have been updated. This is also a significant amount of ongoing work for the spec writers and ASGI web server authors, to support specs that may be of interest to only a small number of people. They might also be proprietary specs, so I suspect open source developers would not be keen to add support for them.

  3. There would need to be a specification for the data format. The documentation for getpeercert() doesn't actually specify the data format in a way that would allow someone else to reimplement it and get the same result. And I think that's important, we can't assume everyone uses Python's "ssl" module to implement TLS, especially for high performance web servers that are unlikely to be written in Python. Writing that spec is a chunk of work.

  4. It makes it harder for every ASGI web server to implement this standard, as they have to parse the certificate correctly to generate the parsed data.

So my preference is for ASGI to just to provide the PEM data, and then middleware/frameworks (e.g. Django) can provide access to a parsed version if they wish to. In this case, the API to the parsed data would be part of Django so could be extended/changed as needed following Django's versioning rules, without worrying about having to update all ASGI web servers and frameworks at the same time.

As an alternative, if you really want some pre-parsed data at the ASGI level, can I suggest having both the full PEM data, and a pre-parsed version that just contains a few key fields to support the most common use-cases? That eliminates my objections (1) and (2), it means people doing weird and/or cutting-edge things can parse the PEM certificate to get all the data they want. You would still need to write a spec, and all ASGI web servers would have to implement it. In this case, I suggest that perhaps the Subject fields from the client certificate are probably all you need - that will cover the most common use case of "I just want to identify the user".

Jon-Work avatar Apr 16 '20 10:04 Jon-Work

@Jon-Work You have some fair points here, to but sure but I'm not sure I follow all of it. The assumption I am working off of is that since client-certificates are validated during the handshake, any server that supports them would need to be able to be able to handle parsing the incoming certificate. From what I've seen in the ASGI servers I have looked at, this happens before the scope is created. I'm not sure how that works if certificates that the server doesn't understand the format of the certificate being presented. Are you expecting the client-certificate to be a pass-through? I can only see that being feasible if the client-certificate is not evaluated during the handshake which I don't believe would be a proper implementation of TLS. Based on all that, it doesn't seem like a major burden to format the information in the certificate as a dictionary based on the relevant RFC(s).

Having said all that, I definitely can see how for a more intense use case like yours that having the actual certificate in it's full form would be useful or even necessary. And having the PEM sure beats having no information about the client certificate at all. And yes, for my use-case (which I imagine is the most common) I am only interested in the subject. The issuer chain is also desirable since it is not strictly the case that all subject will be unique across issuers and it can useful to define authorizations based on issuer.

So I'm on-board with the PEM approach. Would you be willing to lend a hand in defining a POC/PR? @hashbrowncipher has already identified where this change would need to be made but I haven't seen any response to your proposal. The only part I would need to work out is how to convert the binary result into a PEM encoded string (this might be trivial, I haven't begun to look.)

James-Matthew-Watson avatar Apr 16 '20 14:04 James-Matthew-Watson

Re the server not understanding a certificate: A certificate contains fields that are marked as either "critical" or "non-critical". if the server (or anyone else parsing a certificate) doesn't understand a "non-critical" field it's supposed to ignore it; it's only if the server doesn't understand a "critical" field that it's supposed to reject the certificate. But for what I'm doing, I might understand fields that the server doesn't.

I haven't tried this, but according to the docs, the widely used "cryptography" library can do PEM/DER conversions: https://cryptography.io/en/latest/x509/reference/

from cryptography.x509 import load_pem_x509_certificate
from cryptography.x509 import load_der_x509_certificate
from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives.serialization import Encoding

# Load a PEM certificate (should start "-----BEGIN CERTIFICATE-----"):
cert = load_pem_x509_certificate(pem_data_bytes, default_backend())
# Convert to a DER (binary format) certificate:
result_bytes = cert.public_bytes(Encoding.DER)

# Load a DER (binary format) certificate:
cert = load_der_x509_certificate(der_data_bytes, default_backend())
# Convert to a PEM certificate (should start "-----BEGIN CERTIFICATE-----"):
result_bytes = cert.public_bytes(Encoding.PEM)

For your use case, would adding the following extra field be suitable?:

  • tls_client_cert_names (Iterable[Iterable[Tuple[str, str]]]) -- An iterable of x509 Distinguished Names, where each name is represented as an iterable of parts. Each part consists of a kind (common name, country, etc (TODO need to define canonical list of these strings and put it in the spec)) and the value of that part. (TODO: Details about decoding or not IDNs, etc). The first entry is the subject of the client certificate; the second entry is the issuer of the client certificate. Any subsequent entries are part of the certificate chain sent by the client, with each name being the issuer of the certificate that has a subject matching the preceding name. Only applicable if the connection was over TLS; for non-TLS connections or if the client did not provide a client certificate then it will be an empty iterable. Some web server implementations may be unable to provide this (e.g. if TLS is terminated by a separate proxy or load balancer). Optional; defaults to empty list. If tls_client_cert_chain is provided and non-empty then this field must be provided and must contain information that is consistent with tls_client_cert_chain.

I'm happy to help, but my availability may be limited, so it would be better if you took the lead on this, if you're willing to do that?

Hopefully the bullet points I've written can become part of a spec PR with just minor editing, I tried to write them in "spec language".

Jon-Work avatar Apr 16 '20 17:04 Jon-Work

@Jon-Work Sorry for the delayed response. I got a little busy on Friday. I will take a crack at it once I have a little time and follow up. Thanks.

James-Matthew-Watson avatar Apr 20 '20 13:04 James-Matthew-Watson

Very interested in seeing any progress on this proposal. Adding this to the spec would seem to provide one of the final missing pieces of information about a connection to the server.

As for the formatting I am inclined to lean towards what @Jon-Work has proposed using PEM encoded strings as it seems the most portable for the future. Effectively providing the raw data to the server/middeware and allowing each server/middleware implementation to determine the format that is provided to the end user.

mdgilene avatar Aug 09 '20 17:08 mdgilene

This would be tremendously useful. Mutual TLS for authentication is only partially useful without being able to identify the client. Uvicorn (and presumably other ASGI servers) refuses to implement anything like this until it is listed in the ASGI specification, but unfortunately this discussion seems to have died. What can we do to resurrect it?

AdmiralNemo avatar Aug 31 '20 20:08 AdmiralNemo

I've had a go at a proposal for this. See pull request. Comments and feedback welcome.

jonfoster avatar Sep 02 '20 18:09 jonfoster

Pull request updated. Comments and feedback welcome.

jonfoster avatar Oct 03 '20 18:10 jonfoster

I was going to put in a PR for this but the Zombie apocalypse side-tracked things a bit. I've actually got a working hack to pass some of the details along working in hypercorn. Not sure if that's useful right now.

On Sat, Oct 3, 2020 at 2:16 PM jonfoster [email protected] wrote:

Pull request updated. Comments and feedback welcome.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/django/asgiref/issues/112#issuecomment-703143491, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB55BSRUWW2H4NWPK3QMGR3SI5S6VANCNFSM4IKOUQVA .

James-Matthew-Watson avatar Oct 03 '20 20:10 James-Matthew-Watson

@James-Matthew-Watson An implementation of the proposed spec would be very useful - it would prove that it's actually possible to implement the spec. It's likely that while implementing it, you'll have comments on the spec that can make it better.

(The above applies even if the implementation code is a bit hacky - it matters that you correctly implement the spec, not how you do that).

jonfoster avatar Oct 04 '20 14:10 jonfoster

I had implemented passing along a few of the fields off of the certificate such as issuer and subject but that was before I got involved in this discussion. What Jon-Work proposed has a lot more to it. I then started looking at that on uvicorn and got into the PEM encoding of the certificate and found I could do it with standard libraries. I was thinking of doing both a uvicorn and hypercorn implementation but I'll look into getting one working; probably hypercorn.

On Sun, Oct 4, 2020 at 10:18 AM jonfoster [email protected] wrote:

@James-Matthew-Watson https://github.com/James-Matthew-Watson An implementation of the proposed spec would be very useful - it would prove that it's actually possible to implement the spec. It's likely that while implementing it, you'll have comments on the spec that can make it better.

(The above applies even if the implementation code is a bit hacky - it matters that you correctly implement the spec, not how you do that).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/django/asgiref/issues/112#issuecomment-703262436, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB55BSQJX7RRHR6YATETMITSJB72JANCNFSM4IKOUQVA .

James-Matthew-Watson avatar Oct 04 '20 17:10 James-Matthew-Watson

We have currently implemented the support to get client certificate details with NGINX + proxy headers (encoded PEM) but that method is missing actual chain but gets us forward and NGINX does certificate chain validation.

While looking at the problem I found that ASGI is missing the feature for direct passage thus I found this issue.

While looking at the comments there I wanted to point out that JWS spec is using JSON for the representation of the chain x5c field: https://datatracker.ietf.org/doc/html/rfc7515#section-4.1.6

What I see nice in this JWS spec is that whole chain is provided and simple encoding of DER is used to make it compact.

In case you are not aware DER is the "raw" form of the certificate in binary format and PEM has it in base64 encoded format with additional line breaks, header and footer. PEM format is often used in files as that allows whole chain to be provided while with DER there is no common way of doing that.

In our use case we have custom extended key usages in the certificates that we observe to see if certain extended key usages are allowed and then fish computer vs. user certificate details from SAN field. In theory we could observe parsed form of the certificate but somehow it feels better to have exact client certificate (chain) and analyze that. Using cryptography library as illustrated above is quite easy.

What is very important is that technical client certificate validation is done properly before forwarding the details for application. This removes the need for application to do full chain validation especially in the case where full chain information is not available.

Is there a need of help with the topic?

We have played with stack: nginx+uvicorn+starlette+fastapi

vesajaaskelainen avatar Jan 09 '22 10:01 vesajaaskelainen

This ticket is basically awaiting someone to sit down and fully write out the spec as a pull request, so we can get direct feedback on the format.

andrewgodwin avatar Jan 10 '22 04:01 andrewgodwin

This can be closed, I guess?

Kludex avatar Mar 13 '23 22:03 Kludex