nebari icon indicating copy to clipboard operation
nebari copied to clipboard

[ENH] - Ensure that default Dask Gateway environment matches active kernel environment

Open dharhas opened this issue 2 years ago • 7 comments

Feature description

Currently, the dask gateway cluster option defaults to the first environment available rather than the environment actually being used by the notebook. If the environment doesn't have dask in it then the next stage just hangs. It is really easy not to notice that the environment being used by dask-gateway is the wrong environment when running through all the cells.

I propose that we ensure the the default conda environment for dask be the one being actively being used by the jupyter kernel since that is the most sensible default.

In the example below we see that the filesystem/dashboard env is the default, even though the notebook is running filesystem/dask

from dask_gateway import Gateway
gateway = Gateway()

options = gateway.cluster_options()
options
image

Value and/or benefit

Makes using Dask-Gateway less error prone and improves usability.

Anything else?

No response

dharhas avatar May 20 '22 02:05 dharhas

This will definitely need investigation. Googling breifly I don't see a staightforward way to get the jupyter kernel name without javascript.

costrouc avatar May 20 '22 03:05 costrouc

This also gets into the reproducibility angle and dashboarding. i.e. knowing the kernal being used and putting it in the notebook metadata can help with reproduction and also with picking a good default environment for dashboard sharing.

dharhas avatar May 20 '22 03:05 dharhas

There seems to be a default config yaml that can be loaded in here https://gateway.dask.org/configuration-user.html#default-configuration which has the luster options in it -- we might be able to set the env programmatically in there I think... but don't know how that interferes with the gateway.cluster_options()

viniciusdc avatar May 20 '22 13:05 viniciusdc

c.c @costrouc I think this might help as well https://docs.dask.org/en/latest/deploying-kubernetes-helm.html?highlight=conda%20environemt#matching-the-user-environment

viniciusdc avatar May 20 '22 15:05 viniciusdc

We can set the filesystem/dask env as default, which can be overwritten easily using the cluster options GUI. The only issue is that cant automatically detect the active environment with this... unless we do something during the deployment (aka bash with conda active env variable) to dynamically update this file .config/dask/gateway.yaml

edit* It seems to be possible using $CONDA_DEFAULT_ENV

image

As we are using dask_gateway to perform this and the dask permission system from Keycloak we should be okay with the default env containing dask during this "inspection"

viniciusdc avatar May 20 '22 19:05 viniciusdc

HI @Chris Ostrouchov about the Gateway default option for cluster env, what do you think of using the above approach?

viniciusdc avatar May 20 '22 19:05 viniciusdc

cc @viniciusdc for visibility

dcmcand avatar Feb 08 '24 17:02 dcmcand

I forgot about this, I will open a PR as this is now easier to achieve using conda-store endpoints

viniciusdc avatar Apr 18 '24 17:04 viniciusdc