dask-labextension
dask-labextension copied to clipboard
Listing DaskGateway clusters created via Python code alongside those created via the dask labextension UI
What happened:
I can create a dask-gateway cluster via the dask-labextension view and I'll see it visible there then.

But, if I create a dask-gateway cluster from a notebook using code like below, then no dask cluster shows up in the list of clusters.
from dask_gateway import Gateway
gateway = Gateway()
cluster = gateway.new_cluster()
My wish
My wish is that the dask clusters I've created should be listed visually. I'm not sure if this is possible or not, but I'd like to describe this wish here to explore if we can make it happen one way or another.
Environment:
JupyterHub (1.1.1 Helm chart) + Dask-Gateway (0.9.0 Helm chart).
$ conda list | grep dask
dask 2021.6.0 pyhd8ed1ab_0 conda-forge
dask-core 2021.6.0 pyhd8ed1ab_0 conda-forge
dask-gateway 0.9.0 py38h578d9bd_0 conda-forge
dask-glm 0.2.0 py_1 conda-forge
dask-kubernetes 2021.3.1 pyhd8ed1ab_0 conda-forge
dask-labextension 5.0.2 pyhd8ed1ab_0 conda-forge
dask-ml 1.9.0 pyhd8ed1ab_0 conda-forge
pangeo-dask 2021.06.05 hd8ed1ab_0 conda-forge
$ python --version
Python 3.8.10
Operating System: Ubuntu 20.04 Install method: conda-forge
# The current environment and dask configuration via environment
DASK_DISTRIBUTED__DASHBOARD_LINK=/user/{JUPYTERHUB_USER}/proxy/{port}/status
DASK_GATEWAY__ADDRESS=http://10.100.116.39:8000/services/dask-gateway/
DASK_GATEWAY__AUTH__TYPE=jupyterhub
DASK_GATEWAY__CLUSTER__OPTIONS__IMAGE={JUPYTER_IMAGE_SPEC}
DASK_GATEWAY__PROXY_ADDRESS=gateway://traefik-prod-dask-gateway.prod:80
DASK_GATEWAY__PUBLIC_ADDRESS=/services/dask-gateway/
DASK_LABEXTENSION__FACTORY__CLASS=GatewayCluster
DASK_LABEXTENSION__FACTORY__MODULE=dask_gateway
DASK_ROOT_CONFIG=/srv/conda/etc
Thanks for raising this @consideRatio . In general, this is a hard problem, as dask doesn't really have a built-in cluster discovery method. Short of port sniffing, I'm not sure I know of a good way to handle auto-detecting any cluster in a given notebook (or set of notebooks). Indeed, part of the reason for creating the cluster manager sidebar in the first place was to be able to build some user interfaces around starting, stopping, and scaling clusters that the extension can actually keep track of and reason about.
That being said, my goal for this extension is to get out of the game of managing clusters directly, and instead investigate a solution like dask-ctl. This could allow different cluster providers to set up their own discovery and control services, which the labextension could then consume. There is some detailed discussion of this in #189, I encourage you to weigh in!
I would really like dask-ctl to be the solution for this.
Related question. How do you configure the lab extension to use dask-gateway for creating new clusters. I can't find that anywhere in the docs but clearly from the screenshot above it is possible.
@dharhas I haven't tried it recently myself, but the configuration that @consideRatio posted above looks like the correct approach to me (though it could also be configured using a yml file or what have you):
# The current environment and dask configuration via environment
DASK_DISTRIBUTED__DASHBOARD_LINK=/user/{JUPYTERHUB_USER}/proxy/{port}/status
DASK_GATEWAY__ADDRESS=http://10.100.116.39:8000/services/dask-gateway/
DASK_GATEWAY__AUTH__TYPE=jupyterhub
DASK_GATEWAY__CLUSTER__OPTIONS__IMAGE={JUPYTER_IMAGE_SPEC}
DASK_GATEWAY__PROXY_ADDRESS=gateway://traefik-prod-dask-gateway.prod:80
DASK_GATEWAY__PUBLIC_ADDRESS=/services/dask-gateway/
DASK_LABEXTENSION__FACTORY__CLASS=GatewayCluster
DASK_LABEXTENSION__FACTORY__MODULE=dask_gateway
DASK_ROOT_CONFIG=/srv/conda/etc
In particular, the factory class and factory module options tell the labextension what to use when starting a new cluster.