dask-kubernetes
dask-kubernetes copied to clipboard
Dask Auto scaler failing to create
I'm trying to setup a simple DaskAutoscaler on Kubernetes using YAML files, but somehow the auto scaler failes to be created with the following error
Error Logging 45s kopf Timer 'daskautoscaler_adapt' failed with an exception. Will retry.
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/kopf/_core/actions/execution.py", line 276, in execute_handler_once
result = await invoke_handler(
File "/usr/local/lib/python3.10/site-packages/kopf/_core/actions/execution.py", line 371, in invoke_handler
result = await invocation.invoke(
File "/usr/local/lib/python3.10/site-packages/kopf/_core/actions/invocation.py", line 116, in invoke
result = await fn(**kwargs) # type: ignore
File "/usr/local/lib/python3.10/site-packages/dask_kubernetes/operator/controller/controller.py", line 850, in daskautoscaler_adapt
scheduler = await Pod.get(
File "/usr/local/lib/python3.10/site-packages/kr8s/_objects.py", line 186, in get
raise NotFoundError(f"Could not find {cls.kind} {name}.")
kr8s._exceptions.NotFoundError: Could not find Pod None.
Error Logging 2s kopf Handler 'daskautoscaler_create' failed with an exception. Will retry.
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/kopf/_core/actions/execution.py", line 276, in execute_handler_once
result = await invoke_handler(
File "/usr/local/lib/python3.10/site-packages/kopf/_core/actions/execution.py", line 371, in invoke_handler
result = await invocation.invoke(
File "/usr/local/lib/python3.10/site-packages/kopf/_core/actions/invocation.py", line 116, in invoke
result = await fn(**kwargs) # type: ignore
File "/usr/local/lib/python3.10/site-packages/dask_kubernetes/operator/controller/controller.py", line 841, in daskautoscaler_create
autoscaler = await DaskAutoscaler(body)
File "/usr/local/lib/python3.10/site-packages/kr8s/_objects.py", line 45, in __init__
raise ValueError("resource must be a dict or a string")
ValueError: resource must be a dict or a string
The autoscaler.yaml file that I am using is this one
apiVersion: kubernetes.dask.org/v1
kind: DaskAutoscaler
metadata:
namespace: dask
name: autoscaled
spec:
cluster: autoscaled
minimum: 1
maximum: 5
the cluster YAML definition is as follow
apiVersion: kubernetes.dask.org/v1
kind: DaskCluster
metadata:
name: autoscaled
namespace: dask
spec:
worker:
replicas: 0
spec:
serviceAccountName: dask-operator-sa
tolerations:
- key: dedicated
operator: Equal
value: dask-worker
nodeSelector:
dedicated: dask-worker
containers:
- name: worker
image: "ghcr.io/dask/dask:latest"
imagePullPolicy: "IfNotPresent"
args:
- dask-worker
- --name
- $(DASK_WORKER_NAME)
- --dashboard
- --dashboard-address
- "8788"
ports:
- name: http-dashboard
containerPort: 8788
protocol: TCP
env:
- name: EXTRA_PIP_PACKAGES
value: pyarrow s3fs
resources:
limits:
cpu: "2"
memory: "18G"
requests:
cpu: "1"
memory: "16G"
scheduler:
spec:
containers:
- name: scheduler
image: "ghcr.io/dask/dask:latest"
imagePullPolicy: "IfNotPresent"
args:
- dask-scheduler
ports:
- name: tcp-comm
containerPort: 8786
protocol: TCP
- name: http-dashboard
containerPort: 8787
protocol: TCP
readinessProbe:
httpGet:
port: http-dashboard
path: /health
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
httpGet:
port: http-dashboard
path: /health
initialDelaySeconds: 15
periodSeconds: 20
resources:
limits:
cpu: "1"
memory: "3G"
requests:
cpu: "1"
memory: "2G"
env:
- name: EXTRA_PIP_PACKAGES
value: pyarrow s3fs
- name: DASK_DISTRIBUTED__SCHEDULER__WORKER_SATURATION
value: "1.0"
service:
type: NodePort
selector:
dask.org/cluster-name: autoscaled
dask.org/component: scheduler
ports:
- name: tcp-comm
protocol: TCP
port: 8786
targetPort: "tcp-comm"
- name: http-dashboard
protocol: TCP
port: 8787
targetPort: "http-dashboard"
Environment:
- Dask version: 2023.7.0
- Python version: 3.10.9