configurable-http-proxy
configurable-http-proxy copied to clipboard
[WIP] If multiple x-forwarded-proto use first
See
- https://github.com/tornadoweb/tornado/pull/2162
- https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/1700#issuecomment-709555566
@consideRatio do you reckon this would work?
Thanks for submitting your first pull request! You are awesome! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please make sure you followed the pull request template, as this will help us review your contribution more quickly.
You can meet the other Jovyans by joining our Discourse forum. There is also a intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:
Tornado inspection
Currently tornado cares first and foremost about the x-scheme header, and secondly about the x-forwarded-proto header, but if the x-forwarded-proto header isn't just a single entry of http or https, it will be silently ignored. This logic can be inspected here.
My conclusion
@manics, I think from this it follows that we must configure CHP to pass only a single value in x-forwarded-proto to be tornado compatible, and that value in my mind, should be the leftmost entry, which is exactly what you do in this PR. So, in my mind, this is correct.
Verification from Min
@minrk can you help us review this change by answering the following questions?
- Do you agree JupyterHub's tornado should configure itself using the protocol as requested by the client/browser?
- Do you agree with me and @manics that the leftmost entry in x-forwarded-proto (assuming it is a comma separated list) is representing the client/browser requested protocol? My belief stems from wikipedia of x-forwarded-for and feels a bit shakey as its not x-forwarded-proto specifically.
Ideally this would be handled in JupyterHub since it's the final consumer of the header, but after looking at https://github.com/tornadoweb/tornado/pull/2162 I can't see an obvious way to do that.
Note there are some subtle issues with whether you can trust the header, the Django docs provide a good explanation: https://docs.djangoproject.com/en/3.1/ref/settings/#secure-proxy-ssl-header
However that's separate to this issue where we're deciding which part of the header to use rather than whether to trust it or not.
This is creating a trust issue by discarding information. I'm not sure we want to do it this way. If we want to change what information is used in tornado, we should probably do it in the tornado app.
proxy headers are hard because there's an arbitrary number of layers of proxy headers and you have to decide how many layers you want to trust. E.g. when you have:
- external proxy (e.g. malicious proxy)
- cloud load balancer
- ingress-controller
- chp
we want to make sure to trust forwarding headers from everything but the proxy outside our own application. The most conservative approach is to trust only the last hop, which I think is what tornado does, but what we care about most of the time is what it looks like to the browser, which is the first entry.
Tornado I think lets you actually configure to some degree which things are trusted, but it's hard to do correctly in general.
This is why I think it was a huge mistake by the IETF when they added the Origin header to only include it on some cross-origin requests and not on same-origin requests, because it means we can't reliably use it, as cases where it's missing are not conclusively cases where it should be allowed.
I'm not particularly happy with this solution either, ideally there'd be a way to control how Tornado handles the header, or provide an option to pass it through unchanged to JupyterHub.
I assumed that since you couldn't resolve the problem with https://github.com/tornadoweb/tornado/pull/2162 it was unlikely I'd be able to do anything.... If you can see a way to do it on the JupyterHub side I'm more than happy to go along with that.
An alternative might be the django route where you can configure your first proxy to set a header of your choice instead of x-forwarded-proto or x-scheme, though that's more work for a user.
I'm trying to figure out whether JupyterHub.trusted_downstream_ips should provide a solution instead of this: https://discourse.jupyter.org/t/recommended-use-of-x-forwarded-proto-with-nginx/8035/3