jupyter-server-proxy icon indicating copy to clipboard operation
jupyter-server-proxy copied to clipboard

Enable SSL on forwarded requests

Open rcthomas opened this issue 5 years ago • 32 comments
trafficstars

This is my initial pass at #89. Happy to start iterating.

The main source of uncertainty for me is around check_hostname on the SSL context. I tested without that set and then fretted about it for 3 weeks now, then added it as something you could configure. I looked at both what JupyterHub and Dask Distributed do, and they both have actual opinions. I went with making the default look like what Dask does because that's closest to our set of assumptions. I'm not sure the "True" case has everything else needed to work.

rcthomas avatar Jan 10 '20 19:01 rcthomas

This is IMHO an important pull-request. Any chance it can be merged to master?

jhgoebbert avatar Feb 23 '20 19:02 jhgoebbert

There are questions about how this should work/be implemented and what the exact goal/use-case is. I think to get the review process of this moved forward we need an example that people can try out to help understand the context. At least I think that would help sort out some of the questions that have come up. If you can prepare one that we can try at home that would be a great contribution.

betatim avatar Feb 24 '20 07:02 betatim

We have a situation where compute nodes on our supercomputer are normally insulated from inbound connections from the outside world. The inside just has 10.x.x.x interfaces. Notebooks normally run outside those batch-scheduled nodes and users can keep notebook servers around persistently on those "outside" nodes, which are really just repurposed login nodes. These nodes do have an interface on the same network as the compute nodes. We use jupyter-server-proxy to forward things through to/from the computes, it's pretty neat.

Our users want to run Dask on the compute nodes in job allocations and talk to them from their persistent JupyterLab sessions, and we'd like to help them run SSL on the cluster and the dashboard. Yes it's an option to set up the scheduler and dashboard alongside the notebook server on the front end but network-wise this is less optimal. We are in the process of implementing JupyterHub end-to-end SSL as well. The Dask dashboard is the main focus for us with this pull request. There is a way to run the dashboard with SSL but the requests that jupyter-server-proxy needs to forward also need SSL support, hence the SSL context here (it is not for the scheduler+workers itself). The main goal is for each user to be able to be sure the entire paths are running under SSL.

rcthomas avatar Feb 25 '20 17:02 rcthomas

Thanks for expanding your use case. I think a diagram of the requests between each of the components would be helpful. In particular I still don't understand why jupyter-server-proxy requires a private key. Typically the key would only be required for SSL if you're running a server. jupyter-server-proxy isn't a server, it's an add-on to jupyter-notebook which is the server. If the aim is to encrypt traffic between the dashboard and jupyter wouldn't that need to be configured on the dashboard side?

manics avatar Feb 25 '20 20:02 manics

I'll try to get a diagram going.

Indeed the aim is to encrypt traffic between the dashboard and Jupyter, and it is configured on the dashboard side. But the client (jupyter-server-proxy) needs an SSL context to talk to it. That's how I envisioned this working at least...

rcthomas avatar Feb 25 '20 21:02 rcthomas

diagram

The hub and hub proxy are running on some servers managed by Rancher (Docker containers), and a load balancer not shown does the front-end termination. The hub starts Jupyter notebooks on a "login" node using a custom spawner. Now suppose a user submits a job with Slurm and starts up a Dask cluster inside the job allocation, including a scheduler, the dashboard, and a bunch of workers. These nodes running Dask are compute nodes and are not publicly accessible, so to see the dashboard running there we need Jupyter server proxy, in fact we needed #154 for that to work.

We'd like to be able to talk to the dashboard if it's using TLS, and I think that means we need the SSL context to be set up as a client in jupyter-server-proxy.

rcthomas avatar Mar 03 '20 22:03 rcthomas

Thanks for the diagram. I agree you only need a SSL client. I'm not convinced it requires a key though. For example suppose you wanted to proxy https://grafana.mybinder.org/ (there's no reason why this would be different from proxying an internal URL), which key and certificate would you use?

manics avatar Mar 14 '20 14:03 manics

I wasn't thinking about that? My specific use case was for internal-ssl style self-signed certs (similar to how the Hub does it, e.g. via certipy or possibly even trying to re-use some of the infrastructure). At this point I suppose there are a huge number of other use cases we might want to worry about but I didn't realize I'd signed up to handle all those...

If it's just a simple change of making an argument optional or something I don't see an issue at all but if we need a more fully-featured PR than that perhaps someone with more experience with SSL needs to do it?

rcthomas avatar Mar 14 '20 16:03 rcthomas

A self-signed cert would require the internal CA or the public self-signed certificate to verify it, but it still shouldn't require the private key.

manics avatar Mar 14 '20 16:03 manics

See for example this Stack Overflow when using the requests module: https://stackoverflow.com/questions/30405867/how-to-get-python-requests-to-trust-a-self-signed-ssl-certificate

Only the server's public certificate should be required

manics avatar Mar 14 '20 16:03 manics

OK so you mean

ssl_context.load_cert_chain(serverproxy.certfile, serverproxy.keyfile or None)

If the keyfile is left as empty it falls back to the default for load_cert_chain https://docs.python.org/3/library/ssl.html#ssl.SSLContext.load_cert_chain

rcthomas avatar Mar 15 '20 01:03 rcthomas

According to the httpclient.HTTPRequest documentation you can pass the server certificate(s) ca_certs directly to the tornado.httpclient.HTTPRequest() call. I think this would be clearer than using the SSL context object, if only because the documentation is clearer.

What do you think?

manics avatar Mar 28 '20 21:03 manics

When originally drafting this PR I investigated how JupyterHub does this. There's a function there called make_ssl_context(). It's more general purpose than what I'm doing here, because it handles both server and client auth. This is called, for instance, to create the default for hub_http_client.

During that read-up I also saw the documentation you're referencing but when I saw that JupyterHub was already doing this one way, I figured if it was good enough for JupyterHub to use the SSL context then maybe it's good enough for jupyter-server-proxy!

rcthomas avatar Mar 30 '20 15:03 rcthomas

Fair enough! Let's stick with the ssl-context for now then.

In addition to the load_cert_chain method you mentioned there's also a load_default_certs method, so how does the following sound:

  • Internal SSL certificate (your use case): ssl_context.load_cert_chain(serverproxy.certfile, serverproxy.keyfile or None)
  • Public SSL certificate (e.g. proxying a public webserver, or an internal service that uses a certificate from a recognised CA): ssl_context.load_default_certs()

manics avatar Mar 31 '20 11:03 manics

Just wanted to bump this to see where this is :) There's a merge conflict already, would be great to get this in before there are too many of those!

Thanks for the thorough review, @manics.

yuvipanda avatar Apr 20 '20 06:04 yuvipanda

I can add an internal_ssl configuration option in that would decide between load_cert_chain and load_default_certs unless there is a better idea. Before pushing further changes to the PR here I'd like to know if that's what @manics has in mind, an explicit configuration option. I'll try to deal with the merge conflict and whatever guidance there is by the end of the week.

rcthomas avatar Apr 20 '20 16:04 rcthomas

Instead of another property internal_ssl how about only use the existing properties:

  • https: True, other properties unset: use load_default_certs
  • https: True, one or more other properties set: use load_cert_chain?

manics avatar Apr 21 '20 08:04 manics

Thanks @manics I'll try that, probably Wed or Thu for the update.

rcthomas avatar Apr 21 '20 14:04 rcthomas

I think there might be an issue with keyfile being set but not certfile. Man I hate this SSL stuff.

rcthomas avatar Apr 24 '20 00:04 rcthomas

I'll see if I can push some changes so that my test will pass.

manics avatar Apr 26 '20 21:04 manics

I think SSL is cursed!

I managed to get my test passing: https://github.com/jupyterhub/jupyter-server-proxy/compare/master...manics:pr169?expand=1

To do this I moved the SSL configuration from the top-level global config to a per-server config (so https can be enabled for individual proxied servers).

I then (finally!?) understood @rcthomas's use case, which is to enable SSL proxying using something like /proxy/REMOTE_HOST:port/. However setting the SSL config globally would force https on all ports not just individual ones.

Suggestion: Use https for ports that aren't using a manage process by extend the remote-host format to optionally allow a protocol:

  • /proxy/<PROTOCOL>:<REMOTE_HOST>:<PORT>/
  • /proxy/absolute/<PROTOCOL>:<REMOTE_HOST>:<PORT>/

Alternatively instead of <PROTOCOL>:<REMOTE_HOST>:<PORT> this could be <PROTOCOL>://<REMOTE_HOST>:<PORT> as this more naturally represents a server, but this will depend on whther a double // in a url path is handled correctly.

Current URL handlers: https://github.com/jupyterhub/jupyter-server-proxy/blob/b9b4d22fd78e6a267164eb828463166ed0269b28/jupyter_server_proxy/handlers.py#L562-L569

@yuvipanda what do you think?

manics avatar Apr 28 '20 11:04 manics

Hey @manics thanks for the help during my absence here, I am back. Sigh. SSL. Sigh.

You make a good point about how I would be forcing SSL on every remote_host:port combo. In my case I'm mostly worried (that's too strong a word, concerned maybe?) about one use very dominant use case for me (Dask). That said there are other things out there and I don't want to make things horrible for those users just because.

I have no opinion on :// (except that as an emoji it summarizes my feelings about SSL); I do remember that before I encountered the {remote_host}:{port} PR I had just implemented it with a / in between. That is, I don't think we have to make it look like a URL in there though I see how it makes it clearer what's going on. I'd be happy with just /'s between everything.

rcthomas avatar May 19 '20 20:05 rcthomas

To perform some thread necromancy, I think forcing SSL on a per-server basis would mean we don't have to mangle URLs, right? I want folks to be able to use the dask dashboard without having to know if it's over https or http - especially without us redirecting properly. So we can give admins control over this on a per-server basis, and force https whenever it is set.

I think that's what I see in https://github.com/jupyterhub/jupyter-server-proxy/compare/master...manics:pr169?expand=1? If so, I think we can just merge this and iterate.

Thank you for persevering with this, @rcthomas and @manics!

yuvipanda avatar Sep 30 '20 07:09 yuvipanda

We would have a similar need for SSL support in jupyter-server-proxy.
Here it is not the dask dashboard but a user´s noVNC process on a multi-user-system, similar to @yuvipanda jupyter-desktop-service, which provides the user with a remote desktop in its browser.
The user´s noVNC process runs on one of our nodes and provides access to the user´s vncserver through http/https. Without https between noVNC and jupyter-server-proxy it would be a pretty insecure setup on a multi-user cluster, wouldn't it?

jhgoebbert avatar Oct 25 '20 12:10 jhgoebbert

Hi, I want to use jupyter-server-proxy to forward some requests to a web server that runs in the kernel. As I understand it, because mybinder uses https, my web server must also use it, otherwise there is some mixing secure with insecure stuff issue. I tried to set up SSL in my web server, with a self-signed certificate, but it is not accepted. Would this PR solve my issue? Do you plan to merge it soon? Thanks!

davidbrochart avatar Nov 01 '20 14:11 davidbrochart

@davidbrochart https is only required for external connections between your browser and mybinder. If your webserver is running inside your Docker container then plain http should work since communication will be fully within the container (between the notebook process and your webserver). External connections will automatically be https.

manics avatar Nov 01 '20 16:11 manics

@davidbrochart https is only required for external connections between your browser and mybinder.

But I think this is the case, my server is running in the kernel which is inside the Docker container, but the requests are made from the browser. So I understand that my server must support SSL, and that it should get the certificate from mybinder, but I'm not sure if this PR will solve my issue.

davidbrochart avatar Nov 02 '20 00:11 davidbrochart

@davidbrochart If the server you are talking about is running in the container alongside the notebook server, you shouldn't need to worry about SSL, you just proxy that server's port normally. You only care if the server's off somewhere else and you want to encrypt traffic between your notebook and the service. That's what this was originally about, in particular the case where the network path between jupyter-server-proxy and the back-end service is not going over the internet

rcthomas avatar Nov 04 '20 00:11 rcthomas

Yes the server is running in the container, and I would be fine with no SSL, but because the browser is going to make requests from an https URL, I am forced to serve through https also, otherwise there is some mixed-content issue. Does it make sense?

davidbrochart avatar Nov 04 '20 03:11 davidbrochart

@davidbrochart I think we'll need to see a diagram of all you components and communication between them.

In order to keep this discussion focused on the PR would you mind re-asking your question and adding the additional information on the Jupyter Community Forum https://discourse.jupyter.org/ instead and I'll follow up there?

In doing so you'll also be helping others working with jupyter-server-proxy, I'm sure others have run into the same problem so it'd be valuable to have this as a community post. Thanks 😀.

manics avatar Nov 04 '20 07:11 manics