Tests are run prematurely, before services start working.
Hello,
Consider the following k8s manifests, please:
---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
name: stefanprodan
namespace: default
spec:
interval: 15m
type: oci
url: oci://ghcr.io/stefanprodan/charts
---
apiVersion: helm.toolkit.fluxcd.io/v2beta2
kind: HelmRelease
metadata:
name: podinfo
namespace: default
spec:
interval: 15m
chart:
spec:
chart: podinfo
version: 6.5.4
sourceRef:
kind: HelmRepository
name: stefanprodan
releaseName: podinfo
test:
enable: true
values:
fullnameOverride: podinfo
probes:
startup:
enable: true
Unfortunately, the installation of podinfo in such a configuration is not successful, because the tests run even before Pod reports that it is ready to handle requests.
In the helm controller logs you can read:
{"level":"info","ts":"2024-01-24T17:15:53.541Z","msg":"running 'test' action with timeout of 5m0s","controller":"helmrelease","controllerGroup":"helm.toolkit.fluxcd.io","controllerKind":"HelmRelease","HelmRelease":{"name":"podinfo","namespace":"default"},"namespace":"default","name":"podinfo","reconcileID":"e294060e-e52c-45d7-8f91-2972960a8514"}
{"level":"info","ts":"2024-01-24T17:15:57.961Z","msg":"release is in a failed state: release has test in failed phase","controller":"helmrelease","controllerGroup":"helm.toolkit.fluxcd.io","controllerKind":"HelmRelease","HelmRelease":{"name":"podinfo","namespace":"default"},"namespace":"default","name":"podinfo","reconcileID":"e294060e-e52c-45d7-8f91-2972960a8514"}
{"level":"error","ts":"2024-01-24T17:15:57.972Z","msg":"Reconciler error","controller":"helmrelease","controllerGroup":"helm.toolkit.fluxcd.io","controllerKind":"HelmRelease","HelmRelease":{"name":"podinfo","namespace":"default"},"namespace":"default","name":"podinfo","reconcileID":"e294060e-e52c-45d7-8f91-2972960a8514","error":"terminal error: exceeded maximum retries: cannot remediate failed release"}
The state of Pods in the namespace is:
NAME READY STATUS RESTARTS AGE
podinfo-grpc-test-7mwyr 0/1 Error 0 8s
podinfo-5d6694644d-xgsbp 0/1 Running 0 8s
Logs from Pod podinfo-grpc-test-7mwyr:
timeout: failed to connect service "podinfo.default:9999" within 1s
Could you put in place an implementation such that it only starts testing when all services report that they are ready to handle traffic?
Regards Piotr Minkina
I would think such an implementation would need to be added on the chart side, as part of the actual testing logic. As this problem is not unique to the controller itself, but would also happen when you run a helm test after a helm upgrade.
As I read on the Chart Tests website, I must wait for all pods to become active before run tests. In a situation where we are talking about declarative application of Helm Charts then the controller should do the waiting.
I read the help for the helm install command and I read there:
--wait if set, will wait until all Pods, PVCs, Services, and minimum number of Pods of a Deployment, StatefulSet, or ReplicaSet are in a ready state before marking the release as successful. It will wait for as long as --timeout
--wait-for-jobs if set and --wait enabled, will wait until all Jobs have been completed before marking the release as successful. It will wait for as long as --timeout
Sounds promising, and so I add these parameter to the helm install command, with immediately ordering the execution of tests as soon as helm install returns control.
$ helm install podinfo oci://ghcr.io/stefanprodan/charts/podinfo --version 6.5.4 --set probes.startup.enable=true --wait --wait-for-jobs && helm test podinfo
Pulled: ghcr.io/stefanprodan/charts/podinfo:6.5.4
Digest: sha256:a961643aa644f24d66ad05af2cdc8dcf2e349947921c3791fc3b7883f6b1777f
NAME: podinfo
LAST DEPLOYED: Wed Jan 24 19:56:28 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
NOTES:
1. Get the application URL by running these commands:
echo "Visit http://127.0.0.1:8080 to use your application"
kubectl -n default port-forward deploy/podinfo 8080:9898
NAME: podinfo
LAST DEPLOYED: Wed Jan 24 19:56:28 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: podinfo-grpc-test-ibama
Last Started: Wed Jan 24 19:56:29 2024
Last Completed: Wed Jan 24 19:56:33 2024
Phase: Failed
NOTES:
1. Get the application URL by running these commands:
echo "Visit http://127.0.0.1:8080 to use your application"
kubectl -n default port-forward deploy/podinfo 8080:9898
Error: 1 error occurred:
* pod podinfo-grpc-test-ibama failed
Well, and unfortunately the effect is the same. The tests were run before the application Pods reported ready to receive traffic. I think it is simply a problem with the --wait parameter — it seems that it is not working as it should... What do You think @hiddeco?
You need to set replicas 2, Helm has bug where it doesn’t wait for a single pod to be ready.
@stefanprodan This bug you write about is reported somewhere? Is anyone fixing it? Actually increasing the replicas to 2 caused Helm to wait until Pod was ready. Thanks! Thus, the grpc-test test executed correctly, while I don't know why the jwt-test test ended with an error and no log left behind (the exit code was 1).
This bug you write about is reported somewhere? Is anyone fixing it?
It's somewhere in the Helm repo, reported several years ago.
This didn't made it in Helm https://github.com/helm/helm/pull/10831 nothing we can in Flux about it.