dask-gateway k8s-backed cluster remains active when client's kernel dies unintentionally

k8s-backed cluster remains active when client's kernel dies unintentionally

Open bolliger32 opened this issue 3 years ago • 0 comments

What happened: I'm not sure if this is a bug report or feature request, but the behavior is not desired for our use case and I can't seem to figure out a way to get the desired behavior. The issue occurs when the kernel for a notebook containing a client (which is linked to a GatewayCluster) unintentionally dies (e.g. due to a memory overload issue). We're using the kubernetes backend within a daskhub helm chart configuration. Even if the cluster has been created with shutdown_on_close=True, the scheduler does not seem to shut down in these instances and instead remains active until its idle_timeout setting is reached and can be reconnected to from a new notebook. This does not occur if the cluster is shutdown explicitly, nor does it occur if you intentionally shut down (or restart) the kernel.

What you expected to happen: The behavior I expect (and desire) is based off of my experience with dask-kubernetes, in which I was mainly using a local scheduler. In this case, the scheduler was virtually always in the same kernel as the notebook, so when one died, the other died. I recognize the dask-gateway remote scheduler situation might be a bit more difficult to handle, and that in some cases you would want your scheduler to persist in a situation like this. But I'm wondering if there is a flag that could be used to signify if you want to shut down a cluster even if a client connection is closed unintenionally.

Anything else we need to know?: If this is just a matter of implementation, I'm happy to take a crack at this if others have a sense of where to start. I did a bit of digging but didn't have a great idea of where this behavior might be altered.

Environment:

Dask version: 2021.4.1
Python version: 3.8.8
Operating System: Linux (daskhub helm-chart defining a kuberentes setup)

Apr 30 '21 13:04 bolliger32

dask-gateway dask-gateway copied to clipboard

k8s-backed cluster remains active when client's kernel dies unintentionally

dask-gateway
dask-gateway copied to clipboard