distributed
distributed copied to clipboard
New users dashboard experience
I just helped a Dask user get the Dashboard running during the SciPy sprints. It was pretty painful and had a bunch of failures along the way that made it hard. I wanted to capture that here in case there are things we can do to improve this experience.
- The user was running Jupyter Lab on a remote machine via SSH, they set up SSH port forwarding to access Jupyter
- They created a Dask cluster with
client = Client() - Failure 1 They clicked the link in the widget that said
http://localhost:8787/status, which wasn't port forwarded - Failure 2 They added
8787to the port forward list to fix this but some restriction in their org setup meant this didn't work (I think8888may have been on an allowlist but8787wasn't) - Failure 3 While testing they continued to run
client = Client()which created many clusters and they were confused why the port kept changing - @jsignell and I got involved at this point to help out. My suggestion was to use jupyter-server-proxy so that we could access the dashboard via Jupyter which was known to be working
- Failure 4 We stopped Jupyter, ran
pip install jupyter-server-proxyand started Jupyter again. Sadly Jupyter was running from the base environment and we pip installed into the activated environment. So this didn't take effect and took a long time to debug but we eventually got things installed in the right environment (with help from @yuvipanda). - Failure 5 Installing
jupyter-server-proxyupgraded Jupyter which broke the user's Jupyter config and also took a while to debug. - Failure 6 Finally we got back into Jupyter, started the Dask cluster and the user clicked the link in the widget again, which was still saying
http://localhost:8787/statusand didn't work - We then manually navigated to
http://localhost:8888/proxy/8787/statusand things worked 🎉 - Finally we added a config line to fix the widget
dask.config.set({"distributed.dashboard.link": "/proxy/{port}/status"})
This was an unpleasant experience for a first-time Dask user. It was time-consuming to debug for three maintainers from across Dask and Jupyter.
I think in an ideal world the steps we could take to resolve this are:
- Ensure
jupyter-server-proxyis installed with Jupyter by default - Autodetect the dashboard URL in the default case
Given that 1 isn't in our control we could focus on 2.
Instead of setting a default value for distributed.dashboard.link we could leave this as None and do some autodetection in this case. For instance could we check if we are inside a notebook kernel and if jupyter-server-proxy is installed and if so put out the proxy URL instead. Or we could check if we are on Binder and put out the Binder URL (xref https://github.com/dask/dask-tutorial/pull/260).
If autodetection fails we can fall back to the current default. And if the user has specified it explicitly we would use that.
Thanks for writing this up @jacobtomlinson, it's always useful (and humbling) to see this through new users' eyes.
Failure 3 While testing they continued to run client = Client() which created many clusters and they were confused why the port kept changing
This one regularly hits me as well, and I have to go on a "find the rogue server" mission. I don't have a good solution to it.
Did you ever try to use dask-labextension? Setting up a proxy URL automatically for a scheduler running on localhost is intended to be the happy path for the cluster manager UI in the sidebar. I think of that cluster manager as being largely a failed experiment (and I'd like it to be replaced by dask-ctl, as you know), but this particular scenario is one that should be supported today.
- Ensure jupyter-server-proxy is installed with Jupyter by default
Even though I don't think this is likely to ever be the case, jupyter-sever-proxy is a dependency of dask-labextension, so that would help bring it in.
For instance could we check if we are inside a notebook kernel and if jupyter-server-proxy is installed and if so put out the proxy URL instead.
One is not supposed to do this from a moral perspective, as
- Different jupyter clients might be rendering the link, not just notebooks
- Different notebook implementations might be rendering the output (e.g., vs code, nteract, cocalc)
But I agree that the status quo here is pretty bad, so I might be willing to bend a bit :)
Did you ever try to use dask-labextension?
We didn't, I think we would've run into the same issues with installing things in the wrong conda environment. But I think if we had gone down that route instead of trying to use the jupyter-server-proxy directly we could've avoided failure 6.
Even though I don't think this is likely to ever be the case, jupyter-sever-proxy is a dependency of dask-labextension, so that would help bring it in.
Oooh I didn't think of this. That's good to know.
One is not supposed to do this from a moral perspective ... But I agree that the status quo here is pretty bad, so I might be willing to bend a bit :)
Yeah this is a tricky one. @jsignell and I spent a bunch of time discussing how we could more intelligently handle the dashboard URL. There is definitely scope for more magic, given that the current user experience is unpleasant, but magic can be brittle and opaque. If you have thoughts on how we could improve this I'd be really keen to chat about it.