node-feature-discovery icon indicating copy to clipboard operation
node-feature-discovery copied to clipboard

Make readiness and liveness probes configurable

Open slyt opened this issue 1 year ago • 3 comments

What would you like to be added: I'd like for the readiness and liveness probes to be configurable in the helm values.

Why is this needed: node-feature-discovery-master pods use gRPC probes which are alpha feature gated in k8s v1.23, beta in v1.24, and GA in 1.27. This becomes problematic when trying to deploy node-feature-discovery on older versions of kubernetes.

On k8s v1.23 without the gRPC feature gate enabled, the node-feature-discovery-master pods never appear ready. Here are the events:

Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  25m                    default-scheduler  Successfully assigned nvidia-gpu-operator/nvidia-gpu-operator-node-feature-discovery-master-7c8c9856svd9x to dev-worker-cpu-0
  Normal   Pulled     25m                    kubelet            Container image "registry.k8s.io/nfd/node-feature-discovery:v0.15.4" already present on machine
  Normal   Created    25m                    kubelet            Created container master
  Normal   Started    25m                    kubelet            Started container master
  Warning  Unhealthy  19m (x31 over 24m)     kubelet            Liveness probe errored: missing probe handler for nvidia-gpu-operator-node-feature-discovery-master-7c8c9856svd9x_nvidia-gpu-operator(d216ddfc-d0a7-4bb5-950f-38248bfc8d17):master
  Warning  Unhealthy  4m50s (x136 over 24m)  kubelet            Readiness probe errored: missing probe handler for nvidia-gpu-operator-node-feature-discovery-master-7c8c9856svd9x_nvidia-gpu-operator(d216ddfc-d0a7-4bb5-950f-38248bfc8d17):master

If the the probes were configurable I could point them at the HTTP Prometheus server running on port 8081 or just null them out:

readinessProbe:
  httpGet:
    path: /metrics
    port: 8081
  initialDelaySeconds: 5
  periodSeconds: 10
  timeoutSeconds: 2
  failureThreshold: 3
  successThreshold: 1
livenessProbe:
  httpGet:
    path: /metrics
    port: 8081
  initialDelaySeconds: 10
  periodSeconds: 10
  timeoutSeconds: 2
  failureThreshold: 3

slyt avatar May 31 '24 16:05 slyt