dask-kubernetes
dask-kubernetes copied to clipboard
Native Kubernetes integration for Dask
I recently used Dask on kubernetes using the operator which I installed using helm. Noticed that workers are constantly killed during scaling up/down. I came across issues registered already and...
In #689 we added support to optionally configure `ServiceMonitor` and `PodMonitor` resources for Prometheus to track Dask components. It would be nice if the controller Pod also exported some metrics...
It'd be nice if `dask-kubernetes` had a nightly CI job that opened an issue if a test failure occurs. Since the `main` branch of `dask` and `distributed` are already used,...
In `dask_kubernetes.operator.kubecluster` we have `make_cluster_spec`, `make_scheduler_spec` and `make_worker_spec`. These are called by `dask_kubernetes.operator.KubeCluster` when creating a cluster or can be invoked directly and the output modified and passed to `KubeCluster`....
So when using dask cluster on kubernetes with adaptive scaling there are two issues I noticed. One is repeated scale up or scale down request happening but no scaling happens...
Currently when you create a `KubeCluster` object it silently hangs until the scheduler has been created. I typically also have `kubectl`/`k9s` in a separate terminal so I can watch what...
The `adapt` method in the `classic.KubeCluster` implementation relies on the [distributed.Cluster adapt method](https://github.com/dask/distributed/blob/19deee3b1ac7298dd6b7319481417b0b4e1986d8/distributed/deploy/cluster.py#L251:L266) and is `synchronous` if we create the cluster with `asynchronous=False` or `asynchronous=True`. The [adapt](https://github.com/dask/dask-kubernetes/blob/main/dask_kubernetes/operator/kubecluster/kubecluster.py#L643:L658) method in the...
When a `KubeCluster` is already in adaptive mode calling `cluster.adapt()` again results in an error. ```python cluster.adapt(4, 100) --------------------------------------------------------------------------- ApiException Traceback (most recent call last) ~/Projects/dask/dask-kubernetes/dask_kubernetes/operator/kubecluster/kubecluster.py in _adapt(self, minimum, maximum)...
Dask version `2022.11.0` which uses version `2022.11.0` of `distributed`. `distributed` changelogs: https://distributed.dask.org/en/stable/changelog.html