HTTPS proxies support
Refs #1424, #1428
Terminology
- HTTP proxy: a proxy server which supports connecting to it via HTTP. HTTP requests are forwarded, HTTPS requests are tunneled (via HTTP
CONNECT). — HTTPX has good support for those, no questions asked. ✅ - HTTPS proxy: a proxy server that supports connecting to it via HTTPS. Only HTTPS requests are supported, and must (?) be tunneled (TLS-in-TLS). — This is what seems to be missing still. :x:
Problem statement
We've seen reports of issues recently such as #1424 and #1428 that reveal that our proxies implementation does not properly support HTTPS proxies yet.
My understanding right now is that supporting this requires implementing a technique known as "TLS-in-TLS" (or perhaps "nested TLS"). Here's how that works:
- HTTPX issues a
CONNECTrequest to the proxy, athttps://proxy.org. This may use a dedicatedproxy_ssl_contextwith proxy-specific certs, that I'll mark as TLS(p) ("p" as "proxy"). - The proxy establishes a TCP tunnel to the
target. The HTTPX-proxy "half" of the TCP connection is over TLS(p), and the other one is not TLS-enabled yet. - HTTPX must perform a TLS handshake with the target server, so that proxy-server "half" of the TCP connection becomes encrypted over TLS(t) ("t" as "target"). I'm not 100% certain I understand how that works. Right now I assume TLS handshake packets would be sent TLS(p)-encrypted to the proxy, which """decrypts""" them and sends them to the server for us. The server responds with its SERVER HELLO. The proxy """encrypts"" them back over TLS(p) and HTTPX sees them. The second pass of the handshake follows the same pattern. (I've put quotation marks because this is actually done without the proxy being able to actually intercept those packets — anyone could confirm?)
✅ Right now we can do steps 1/ and 2/, with the nuance that we have a single ssl_context option that's used for both proxy CONNECT and the handshake (we'd want to have proxy_ssl_context and ssl_context).
:x: What is definitely missing is step 3/.
Right now we attempt to do start_tls(), as if we were tunneling over a standard HTTP connection with the proxy — and generally that fais in a variety of ways depending on sync / async, async library, custom certs, HTTP/1.1 vs HTTP/2, proxy server implementation, etc.
To reproduce
Right now the following would fail:
proxies = {"https": "https://proxy.org:443"}
with httpx.Client(proxies=proxies) as client:
response = client.get("https://example.org")
TODO: full pproxy setup (or perhaps proxy.py, which seems to support HTTPS proxying fully), full sample tracebacks.
Additional context
Marked this as "requests-compat" because this is now supported in urllib3 as of 1.26. It landed via this pull request: https://github.com/urllib3/urllib3/pull/1923. AFAICT they had to implement TLS-in-TLS themselves, overriding the standard http.client connection implementation because that one doesn't support TLS-in-TLS.
Marked this as "httpcore" because our proxy implementation lives there: https://github.com/encode/httpcore
Hi there. In #1428 a had a problem with 3rd type of proxies not indicated here and called "reverse proxy" with SSL termination It works in such way:
- client establishes connection with proxy via TLS connection, and sending "message" to the socket (I'll provide http1.1 examples, but my problem was with http2):
GET /index.html HTTP/1.1
Host: example.com
- server decrypts this "message", chooses the destination server, makes the same request as client but without TLS, and responds to the client with received answer
It's very useful when we're talking about load balancing for example. Client doesn't even know that he speaks to the proxy. Mb this is too redundant information but I just want to be clear
@ech0-py Well, then it seems like you don't want to use proxies in that case?
Just send a request to the target host, as you would if there was no reverse proxy in place. I assume things must be setup in your infra so that the DNS hostname resolves to the reverse proxy IP so that traffic is directed there, but the reverse proxy isn't really a "proxy" in the sense that we're discussing in this issue.
Reverse proxies aren't supposed to be passed as proxies specifically because they're not really web proxies, but servers that defer requests to other servers, eg for the sake of load balancing, as you mentioned. Or am I missing something?
@florimondmanca I think you're right. But in such case I need to setup NSS (nsswitch.conf) at least which requires sudo. Is it ok to continue discussion about this in #1428, because I'm feeling my messages aren't related to this topic?
Hi. Any update on this?
Note to self that requests doesn't support connecting to an HTTPS proxy. Note eg. that there's simply no API for specifying the proxy cert.
This project README has a useful set of past issues referencing this... https://github.com/phuslu/requests_httpsproxy
However urllib3 does support HTTPS proxies, and the tls-in-tls required to connect to an HTTPS website, through a HTTPS secured proxy. Their docs on this are better than anything else I've seen... https://urllib3.readthedocs.io/en/latest/advanced-usage.html#http-and-https-proxies
Am currently doing some digging into this, and looking into what it'll take in order for us to support tls-in-tls across all our backends.
There's a PR to add support to this for requests, since URLLib3 now supports it... https://github.com/psf/requests/pull/5665
I'm going to document an example of what's needed in order to demo urllib3's support for this...
Generate keys/certs for the proxy itself to use, with trustme:
$ venv/bin/trustme-cli
Generated a certificate for 'localhost', '127.0.0.1', '::1'
Configure your server to use the following files:
cert=/Users/tomchristie/GitHub/encode/httpx/server.pem
key=/Users/tomchristie/GitHub/encode/httpx/server.key
Configure your client to use the following files:
cert=/Users/tomchristie/GitHub/encode/httpx/client.pem
Start a secure proxy with proxy.py:
$ venv/bin/proxy --port 6000 --hostname 127.0.0.1 --cert-file server.pem --key-file server.key
2021-05-24 09:51:05,093 - pid:25180 [I] load_plugins:334 - Loaded plugin proxy.http.proxy.HttpProxyPlugin
2021-05-24 09:51:05,093 - pid:25180 [I] listen:115 - Listening on 127.0.0.1:6000
2021-05-24 09:51:05,103 - pid:25180 [I] start_workers:136 - Started 6 workers
Send the HTTP request, using urllib3:
import certifi
import urllib3
from urllib3.util.ssl_ import create_urllib3_context
proxy_ssl_context = create_urllib3_context()
proxy_ssl_context.load_verify_locations("client.pem")
http = urllib3.ProxyManager(
'https://127.0.0.1:6000/',
ca_certs=certifi.where(),
proxy_ssl_context=proxy_ssl_context
)
r = http.request('GET', 'https://example.com/', retries=False)
print(r.status)
@florimondmanca , do you have a time schedule or at least some general understanding when the HTTPS proxy mode (TLS-in-TLS) will be implemented? We have a client that needs it because of their security measures, and we are a bit in troubles not being able to provide them with the software that can connect to APNs via their proxy server.
i am working on this.Made fair progress .Will raise pr in next 1-2 weeks for sure.The tls-in-tls concept is roughly working.
If it helps in any way, we have added secure proxies support in the latest version of httpx-socks
If it helps in any way, we have added secure proxies support in the latest version of httpx-socks
Awesome!!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Needs an up to date, but still valid thx bot.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Steady on, bot.
Would love to see this implemented.
We have support for this in httpcore now.
See https://github.com/encode/httpcore/pull/745 and https://github.com/encode/httpcore/pull/786.
We should extend this into httpx, with an API like...
proxy = httpx.Proxy("https://", ssl_context=...)
client = httpx.Client(proxies=proxy)
Aside...
Do I dislike the proxies=... API? Yes I do.
Is adding proxy_ssl_context sufficient to the existing API okay for this ticket? Yes it is.
Should we provide low-level proxy_ssl_context access rather than high-level verify, certs arguments?
We should add a ssl_context=... parameter to the httpx.Proxy(...) configuration class. We don't want anything more than that.