dask-labextension
dask-labextension copied to clipboard
Access dashboard from scheduler with no public IP?
Is it possible to connect to a scheduler without a public IP running in the same AWS VPC?
What happened:
I'm running a jupyter lab on an AWS EC2 instance and I'm trying to connect to a scheduler with a private IP that was created with dask-cloudprovider running in the same VPC as the lab.
The dashboard gets picked up by the extension as expected, all buttons become orange:

But if I open any of them, they remain blank for a while until they error out with a "took too long to respond" message:


What you expected to happen:
The extension showing the requested plots and metrics.
Anything else we need to know?:
Accessing the dashboard from the lab terminal with curl works, showing that it's not a basic networking issue:

Nothing apparent in the console:

Extensions:
jovyan@201435007e31:~$ jupyter serverextension list
config dir: /usr/local/etc/jupyter
dask_labextension enabled
- Validating...
dask_labextension 5.0.1 OK
jupyter_server_proxy enabled
- Validating...
jupyter_server_proxy OK
jupyterlab enabled
- Validating...
jupyterlab 3.0.16 OK
Environment:
- Dask version: 2021.5.0
- Python version: 3.8.8
- Operating System: Ubuntu Bionic
- Install method (conda, pip, source): pip
Sure that makes sense. The dashboards are being accessed by your browser, so the scheduler needs a public IP for this to work. The address discovered by the extension is where the scheduler thinks the dashboard is, which in this case is correct, it's just not accessible to you.
It looks like you have the jupyter proxy extension installed so you should be able to use that to access it that way. You may need to add a little config to allow access but my guess is your dashboard will be available at
http://<Jupyter IP>/proxy/10.50.172.202/8787/status
thanks @jacobtomlinson, makes sense indeed. Maybe I got a bit too excited with the orange buttons and was hoping the extension would proxy itself ...
I've tried to access the dashboard, but looks like it's being blocked ... digging through the code and docs, it looks like I need to add the host to host_allowlist - does that go into the jupyter lab config file? Or can you point me to some example?
The downside is that I'd need to add / change the IP for every cluster, maybe not a permanent solution
thanks @jacobtomlinson, makes sense indeed. Maybe I got a bit too excited with the orange buttons and was hoping the extension would proxy itself ...
In the medium-term I would like to make this possible, it requires some internal plumbing, but is doable. I would probably do this in concert with some other proxy-related work, cf #190. But for the time being @jacobtomlinson is correct, it needs to be visible to your browser, not just the JupyterLab server.
I've tried to access the dashboard, but looks like it's being blocked ... digging through the code and docs, it looks like I need to add the host to
host_allowlist- does that go into the jupyter lab config file? Or can you point me to some example?
Yes, this should go in the jupyter config file, something like c.ServerProxy.host_allowlist = ["10.x.x.x"].
The downside is that I'd need to add / change the IP for every cluster, maybe not a permanent solution
Yeah, a better long term solution would be to allow this extension to dynamically. But I think that host_allowlist can take a callable, so you may be able to do something like
c.ServerProxy.host_allowlist = lambda ip: ip.split(".")[0] == "10"
(I have not tried this myself)
thanks @ian-r-rose ... required some fiddeling but it works now! Not going into details as this is more a proxy thing. I'd leave this issue open since this might be integrated into the extension. Feel free to close. Thanks!
@valpesendorfer Do you mind sharing what steps you took to get it working? I have my notebook and scheduler deployed as containers on ECS and am able to successfully curl http://Dask-Scheduler:8787/status but am having trouble getting it to proxy so I can use the dask lab extension. (Dask-Scheduler is based on the docker network. This also returns the expected response if I run curl 34.xx.xx.xx/status where 34/xx.xx.xx.xx is the external IP address from the ECS task).
Based on discussion above I:
- added
34.xx.xx.xxto~/.jupyter/jupyter_lab_config.py. I am unclear if modifications to this file will automatically be loaded? It is difficult to include the IP address to the container before launching it because I think it is automatically assigned. I was unable to find ways online to reload this configuration file while jupyter lab is actively running. - Tried accessing the following URL from my browser
http://sagem-loadb-xxx.elb.us-west-2.amazonaws.com:8888/proxy/34.xx.xx.xx/8787/statuswherehttp://sagem-loadb-xxx.elb.us-west-2.amazonaws.com:8888is the URL of the loadbalancer that my notebook is accessible through.34.xx.xx.xxwas looked up within the AWS console.
Trying to access the URL above resulted in a 404 error. I've tried several other URL combinations without luck as well. Thanks.
@rmcsqrd sorry, while I did make it work, I abandoned this idea and haven't used anything like it since, so I forgot all the details. But essentially it's like the steps outline above: add the jupyter proxy extension, generate a config file if not already present, set the host allowlist so it allows only IPs from your internal CIDR and that's it. After that, your dashboard should be available through the proxy. But haven't used it with a containerized scheduler or sagemaker, not sure if it makes any difference (shouldn't though)
@valpesendorfer Thanks for the response and the outline of steps.
RE generating the config file, I am assuming you did that by running jupyter lab --generate-config then adding a line similar to c.ServerProxy.host_allowlist = ["10.x.x.x"].
Do you know if jupyter lab will "hot reload" config changes if it is currently running or if it needs to be restarted? I am running into an issue where my jupyter lab container entrypoint immediately starts running jupyter lab; I tried to generate the config file externally then build it into the container but am unclear if that worked. I tried googling about the "hot reload" config thing but couldn't find anything. Thanks in advance for any insight you might have.
@rmcsqrd If I remember well, I used a callable as the comment suggests. If you specify a string like you do in the example, it'd probably take the xs literally.
RE hot-reload, I have no idea, sorry. But my gut feeling is no.
Thanks for the additional detail @valpesendorfer.
I don't expect hot-reloading to work. You will typically need to make sure that the configuration is present before the serve starts up (some hosted systems allow you to configure a start script or similar, though I'm not familiar with how sagemaker does it).