dask-kubernetes icon indicating copy to clipboard operation
dask-kubernetes copied to clipboard

Readiness/Liveness probes do not accept integer port

Open mmourafiq opened this issue 11 months ago • 4 comments

Describe the issue:

Although the specification of the cluster is suggesting int_or_type, using integer probes raises an error, here's an example based on the documentation where the port http_dashboard is 8786, basically:

              readinessProbe:
                httpGet:
                  port: http-dashboard
                  path: /health
                initialDelaySeconds: 5
                periodSeconds: 10
              livenessProbe:
                httpGet:
                  port: http-dashboard
                  path: /health
                initialDelaySeconds: 15
                periodSeconds: 20

is replaced with this:

              readinessProbe:
                httpGet:
                  port: 8786
                  path: /health
                initialDelaySeconds: 5
                periodSeconds: 10
              livenessProbe:
                httpGet:
                  port: 8786
                  path: /health
                initialDelaySeconds: 15
                periodSeconds: 20

If you check the type definition of the probes, e.g. python definition https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/V1HTTPGetAction.md, you will notice that it's of type object and accepts string or integer, here's also the kubernetes docs: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-liveness-http-request

Full example:

apiVersion: kubernetes.dask.org/v1
kind: DaskJob
metadata:
  name: simple-job
  namespace: default
spec:
  job:
    spec:
      containers:
        - name: job
          image: "ghcr.io/dask/dask:latest"
          imagePullPolicy: "IfNotPresent"
          args:
            - python
            - -c
            - "from dask.distributed import Client; client = Client(); # Do some work..."

  cluster:
    spec:
      worker:
        replicas: 2
        spec:
          containers:
            - name: worker
              image: "ghcr.io/dask/dask:latest"
              imagePullPolicy: "IfNotPresent"
              args:
                - dask-worker
                - --name
                - $(DASK_WORKER_NAME)
                - --dashboard
                - --dashboard-address
                - "8788"
              ports:
                - name: http-dashboard
                  containerPort: 8788
                  protocol: TCP
              env:
                - name: WORKER_ENV
                  value: hello-world # We dont test the value, just the name
      scheduler:
        spec:
          containers:
            - name: scheduler
              image: "ghcr.io/dask/dask:latest"
              imagePullPolicy: "IfNotPresent"
              args:
                - dask-scheduler
              ports:
                - name: tcp-comm
                  containerPort: 8786
                  protocol: TCP
                - name: http-dashboard
                  containerPort: 8787
                  protocol: TCP
              readinessProbe:
                httpGet:
                  port: 8786
                  path: /health
                initialDelaySeconds: 5
                periodSeconds: 10
              livenessProbe:
                httpGet:
                  port: 8786
                  path: /health
                initialDelaySeconds: 15
                periodSeconds: 20
              env:
                - name: SCHEDULER_ENV
                  value: hello-world
        service:
          type: ClusterIP
          selector:
            dask.org/cluster-name: simple-job
            dask.org/component: scheduler
          ports:
            - name: tcp-comm
              protocol: TCP
              port: 8786
              targetPort: "tcp-comm"
            - name: http-dashboard
              protocol: TCP
              port: 8787
              targetPort: "http-dashboard"

Anything else we need to know?:

The error during the submission:

spec.cluster.spec.scheduler.spec.containers[0].readinessProbe.httpGet.port: Invalid value: "integer": spec.cluster.spec.scheduler.spec.containers[0].readinessProbe.httpGet.port in body must be of type string: "integer"

Environment:

  • Dask version: latest
  • Dask Kubernets operator

mmourafiq avatar Jul 25 '23 13:07 mmourafiq