batchspawner
batchspawner copied to clipboard
internal_ssl + SlurmSpawner leads to certificate verification error
Bug description
I have setup a JupyterHub instance on my cluster's login node that uses SlurmSpawner
to spawn notebook servers on our cluster. I have verified that SlurmSpawner
works (wonderfully btw) and that SSL works everywhere except between the Hub server and the spawned notebook servers. I was experimenting with JupyterHub's internal_ssl
feature but as soon as I set it to True
in the config I was met with this error
[W 2020-09-19 20:15:21.818 SingleUserNotebookApp iostream:1432] SSL Error on 9 ('[IP]', 8081): [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1108)
[E 2020-09-19 20:15:21.819 SingleUserNotebookApp singleuser:434] Failed to connect to my Hub at https://[IP]:8081/hub/api (attempt 3/5). Is it running?
Traceback (most recent call last):
File "/opt/jupyterhub/lib/python3.8/site-packages/jupyterhub/singleuser.py", line 432, in check_hub_version
resp = await client.fetch(self.hub_api_url)
File "/opt/jupyterhub/lib/python3.8/site-packages/tornado/simple_httpclient.py", line 330, in run
stream = await self.tcp_client.connect(
File "/opt/jupyterhub/lib/python3.8/site-packages/tornado/tcpclient.py", line 293, in connect
stream = await stream.start_tls(
File "/opt/jupyterhub/lib/python3.8/site-packages/tornado/iostream.py", line 1417, in _do_ssl_handshake
self.socket.do_handshake()
File "/usr/lib/python3.8/ssl.py", line 1309, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1108
I have looked at #31, #103, and jupyterhub/jupyterhub#2055 but I cannot find good documentation on this issue/what I am doing wrong.
Your personal set up
JupyterHub instance using SlurmSpawner
to spawn notebook servers. Hub instance is on the same machine as Slurm login node.
- OS: Ubuntu 20.04.1 LTS (all nodes)
- Version:
jupyter core : 4.6.3
jupyter-notebook : 6.1.4
qtconsole : not installed
ipython : 7.18.1
ipykernel : 5.3.4
jupyter client : 6.1.7
jupyter lab : 2.2.8
nbconvert : 6.0.3
ipywidgets : 7.5.1
nbformat : 5.0.7
traitlets : 5.0.4
- Configuration:
jupyterhub.XXX.XXX
is CNAME-d to a www server on our network and all traffic is proxied through the www server to the login node (where jupyterhub is hosted).infocube.XXX.XXX.XXX
is the login node.jupyterhub_config.py
: https://pastebin.com/BJRb3NfP
Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:
I ran into the same problem and did some digging into it. I'm still new to JupyterHub, so please take everything here with a grain of salt.
The main issue is that as it stands now, the singleuser process does not have the CA certificate and the SSL key it needs to talk to the Hub server. These are created by the create_certs()
method in the base Spawner class. BatchSpawner does not interfere with this. The certificates are being created for the user session, and their locations are passed along in environment variables.
There are two problems, though:
-
The created key/certs are owned and readable only by root. They need to be accessible by the user. If JupyterHub is configured to put them in a globally-accessible directory then all you'd need to do is
chown
the files over to the user. If not, you would need to move/copy them to a place the user can access. Themove_certs()
Spawner method can be used to do both of these. The LocalProcessSpawner has an implementation that could be helpful in BatchSpawner. -
Both the hub server and the node where the singleuser process runs must be listed in the alt names property of the SSL certificate. The catch is that the certificates must be created before the job is even submitted. We have no idea which node the batch job will end up on until the process is about to start.
I can't think of a batch-friendly way to solve problem 2 if the goal is to have only the specific node listed. The simplest workaround would be to list all possible nodes in the ssl_alt_names configuration entry. A slightly better implementation could use a pre-spawn hook to automatically add all nodes in the selected partition, or something to that effect.
I set up a quick test with a move_certs()
implementation that gives the user access to the certificates, and set up a pre-spawn hook that adds the (hardcoded) node name to the spawner's ssl_alt_names at runtime. The config file's entry just lists the Hub server. These were enough to get working internal SSL.
Hi @leitec, I'd also like to use SlurmSpawner with internal_ssl enabled. Why is (2) necessary? Wouldn't it be enough if the client gets any valid signed certificate?
Hi @Hoeze, it's been a while since I looked at this, and things may have changed since then.
I recall that the internal SSL mode uses fairly strict certificate validation. If the hub server is not in alt names, the singleuser process can't provide the hub server with its address and port number. I think it's expected that you will add the hub server hostname there. But then, if the node where singleuser is running isn't listed in alt names, the hub server can't contact the singleuser server at the given address and port.
This refers to the back end certificates created by JupyterHub for each session when internal_ssl is enabled, not the server certificate used on the user-facing JupyterHub endpoint, in case that's what you meant by client.
I see, thanks @leitec!