`hcloud-csi-driver` container is coming up, but then failing the healthz check
TL;DR
hcloud-csi-driver container is coming up, but then failing the healthz check. Here are the logs that suggest things are healthy:
time=2025-06-16T08:43:48.785Z level=DEBUG source=/home/runner/work/csi-driver/csi-driver/internal/metrics/metrics.go:36 msg="registering metrics with registry"
time=2025-06-16T08:43:48.785Z level=DEBUG source=/home/runner/work/csi-driver/csi-driver/internal/metrics/metrics.go:43 msg="registered metrics"
--- Request:
GET /v1/servers?name=nes1-zpj HTTP/1.1
Host: api.hetzner.cloud
User-Agent: csi-driver/2.15.0 hcloud-go/2.21.1
Authorization: REDACTED
Accept-Encoding: gzip
--- Response:
HTTP/2.0 200 OK
Access-Control-Allow-Credentials: true
Access-Control-Allow-Headers: X-Requested-With,Authorization,Content-Type
Access-Control-Allow-Methods: OPTIONS,GET,POST,PUT,PATCH,DELETE
Access-Control-Allow-Origin: *
Access-Control-Expose-Headers: Link,X-Correlation-ID
Access-Control-Max-Age: 86400
Content-Type: application/json
Date: Mon, 16 Jun 2025 08:43:49 GMT
Link: <https://api.hetzner.cloud/v1/servers?name=nes1-zpj&page=1>; rel=last
Ratelimit-Limit: 3600
Ratelimit-Remaining: 3599
Ratelimit-Reset: 1750063429
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: Accept-Encoding
X-Correlation-Id: f8cbc0953ef1ffea
...
time=2025-06-16T08:43:49.183Z level=DEBUG source=/home/runner/work/csi-driver/csi-driver/internal/app/app.go:257 msg="fetched server via server name from KUBE_NODE_NAME env var" server-id=65738927
time=2025-06-16T08:43:49.183Z level=DEBUG source=/home/runner/work/csi-driver/csi-driver/cmd/controller/main.go:56 msg="evaluated default location for volumes" location=fsn1
Yet from inside the container, the health check fails. Subsequently, Kubernetes keeps restarting the container:
curl -v http://localhost:9808/
lhost:9808/metrics* Host localhost:9808 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
* Trying [::1]:9808...
* Connected to localhost (::1) port 9808
* using HTTP/1.x
> GET / HTTP/1.1
> Host: localhost:9808
> User-Agent: curl/8.14.1
> Accept: */*
>
* Request completely sent off
* Recv failure: Connection reset by peer
* closing connection #0
curl -v http://localhost:9808/metrics
* Host localhost:9808 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
* Trying [::1]:9808...
* Connected to localhost (::1) port 9808
* using HTTP/1.x
> GET /metrics HTTP/1.1
> Host: localhost:9808
> User-Agent: curl/8.14.1
> Accept: */*
>
* Request completely sent off
* Recv failure: Connection reset by peer
* closing connection #0
How can I see what the healthz endpoint is checking, and get accurate logs for why it is failing?
Expected behavior
A healthy status, or at least an error reporting why it is not healthy.
Observed behavior
Kubernetes triggering a restart loop.
Minimal working example
No response
Log output
Additional information
No response
Hi,
Could you please provide the steps to reproduce the issue? For example, details such as the CSI driver configuration, volume setup, the output of kubectl -n kube-system describe deployments.apps -l app.kubernetes.io/name=hcloud-csi, and the complete log output from the hcloud-csi-driver pod would be helpful.
Port 9808 is the livenessProbe of the hcloud-csi-driver container in the hcloud-csi-controller Pod. It is using the /healthz endpoint.
curl http://localhost:9808/healthz
This endpoint is provided through the kubernetes-csi/livenessprobe sidecar, which checks if the csi-drivers unix socket is answering the Probe() grpc call.
This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs.