dask-kubernetes icon indicating copy to clipboard operation
dask-kubernetes copied to clipboard

Native Kubernetes integration for Dask

Results 114 dask-kubernetes issues
Sort by recently updated
recently updated
newest added

I recently used Dask on kubernetes using the operator which I installed using helm. Noticed that workers are constantly killed during scaling up/down. I came across issues registered already and...

bug
operator
needs info

In #689 we added support to optionally configure `ServiceMonitor` and `PodMonitor` resources for Prometheus to track Dask components. It would be nice if the controller Pod also exported some metrics...

enhancement
operator

It'd be nice if `dask-kubernetes` had a nightly CI job that opened an issue if a test failure occurs. Since the `main` branch of `dask` and `distributed` are already used,...

enhancement

In `dask_kubernetes.operator.kubecluster` we have `make_cluster_spec`, `make_scheduler_spec` and `make_worker_spec`. These are called by `dask_kubernetes.operator.KubeCluster` when creating a cluster or can be invoked directly and the output modified and passed to `KubeCluster`....

bug
operator

So when using dask cluster on kubernetes with adaptive scaling there are two issues I noticed. One is repeated scale up or scale down request happening but no scaling happens...

bug
operator

Currently when you create a `KubeCluster` object it silently hangs until the scheduler has been created. I typically also have `kubectl`/`k9s` in a separate terminal so I can watch what...

enhancement
operator

The `adapt` method in the `classic.KubeCluster` implementation relies on the [distributed.Cluster adapt method](https://github.com/dask/distributed/blob/19deee3b1ac7298dd6b7319481417b0b4e1986d8/distributed/deploy/cluster.py#L251:L266) and is `synchronous` if we create the cluster with `asynchronous=False` or `asynchronous=True`. The [adapt](https://github.com/dask/dask-kubernetes/blob/main/dask_kubernetes/operator/kubecluster/kubecluster.py#L643:L658) method in the...

bug
operator
needs info

When a `KubeCluster` is already in adaptive mode calling `cluster.adapt()` again results in an error. ```python cluster.adapt(4, 100) --------------------------------------------------------------------------- ApiException Traceback (most recent call last) ~/Projects/dask/dask-kubernetes/dask_kubernetes/operator/kubecluster/kubecluster.py in _adapt(self, minimum, maximum)...

bug

Dask version `2022.11.0` which uses version `2022.11.0` of `distributed`. `distributed` changelogs: https://distributed.dask.org/en/stable/changelog.html

needs info