dask-kubernetes
dask-kubernetes copied to clipboard
Native Kubernetes integration for Dask
Dask scheduler could take a while to retire workers using `/api/v1/retire_workers` since this operation is synchronous from httpx's perspective the server is hanging up the connection thus producing a timeout...
Currently, the operator has a hardcoded TCP protocol https://github.com/dask/dask-kubernetes/blob/92714da5785709726f85c4c6ec92451f5c23ad04/dask_kubernetes/operator/controller/controller.py#L155-L159
Currently, the operator retires workers using the HTTP or RPC APIs however those only control the connected dask workers, the operator should take into count dask's Kubernetes worker pods that...
Implement a way to set a cool-down period for adaptive scaling instead of the hardcoded https://github.com/dask/dask-kubernetes/blob/92714da5785709726f85c4c6ec92451f5c23ad04/dask_kubernetes/operator/controller/controller.py#L816 e.g. ```yaml apiVersion: kubernetes.dask.org/v1 kind: DaskCluster metadata: annotations: kubernetes.dask.org/cooldown-until-interval: "30s" name: dask-f3a0c12f namespace: default...
I'm trying to setup a simple DaskAutoscaler on Kubernetes using YAML files, but somehow the auto scaler failes to be created with the following error ```bash Error Logging 45s kopf...
**Describe the issue**: Although the specification of the cluster is suggesting `int_or_type`, using integer probes raises an error, here's an example based on the documentation where the port `http_dashboard` is...
Right now our CRDs are at `v1` and haven't changed in a breaking way since they were introduced. However we are now at the point where we might want to...
**Describe the issue**: As far as I know, this happened without so much as updating a dependency. When creating a KubeCluster, I get a stack trace saying my service account...
Not all cluster auth providers support refresh tokens. `KubeCluster` fails to instantiate without one out of cluster when parsing the Kubernetes configuration file. It would be helpful if there was...
When you create a DaskCluster (or other CR) the labels get propagated to the child resources. However if you update a DaskCluster with a label after it was created the...