helm-controller icon indicating copy to clipboard operation
helm-controller copied to clipboard

Tests are run prematurely, before services start working.

Open piotrminkina opened this issue 1 year ago • 5 comments

Hello,

Consider the following k8s manifests, please:

---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
  name: stefanprodan
  namespace: default
spec:
  interval: 15m
  type: oci
  url: oci://ghcr.io/stefanprodan/charts
---
apiVersion: helm.toolkit.fluxcd.io/v2beta2
kind: HelmRelease
metadata:
  name: podinfo
  namespace: default
spec:
  interval: 15m
  chart:
    spec:
      chart: podinfo
      version: 6.5.4
      sourceRef:
        kind: HelmRepository
        name: stefanprodan
  releaseName: podinfo
  test:
    enable: true
  values:
    fullnameOverride: podinfo
    probes:
      startup:
        enable: true

Unfortunately, the installation of podinfo in such a configuration is not successful, because the tests run even before Pod reports that it is ready to handle requests.

In the helm controller logs you can read:

{"level":"info","ts":"2024-01-24T17:15:53.541Z","msg":"running 'test' action with timeout of 5m0s","controller":"helmrelease","controllerGroup":"helm.toolkit.fluxcd.io","controllerKind":"HelmRelease","HelmRelease":{"name":"podinfo","namespace":"default"},"namespace":"default","name":"podinfo","reconcileID":"e294060e-e52c-45d7-8f91-2972960a8514"}
{"level":"info","ts":"2024-01-24T17:15:57.961Z","msg":"release is in a failed state: release has test in failed phase","controller":"helmrelease","controllerGroup":"helm.toolkit.fluxcd.io","controllerKind":"HelmRelease","HelmRelease":{"name":"podinfo","namespace":"default"},"namespace":"default","name":"podinfo","reconcileID":"e294060e-e52c-45d7-8f91-2972960a8514"}
{"level":"error","ts":"2024-01-24T17:15:57.972Z","msg":"Reconciler error","controller":"helmrelease","controllerGroup":"helm.toolkit.fluxcd.io","controllerKind":"HelmRelease","HelmRelease":{"name":"podinfo","namespace":"default"},"namespace":"default","name":"podinfo","reconcileID":"e294060e-e52c-45d7-8f91-2972960a8514","error":"terminal error: exceeded maximum retries: cannot remediate failed release"}

The state of Pods in the namespace is:

NAME                       READY   STATUS    RESTARTS   AGE
podinfo-grpc-test-7mwyr    0/1     Error     0          8s
podinfo-5d6694644d-xgsbp   0/1     Running   0          8s

Logs from Pod podinfo-grpc-test-7mwyr:

timeout: failed to connect service "podinfo.default:9999" within 1s

Could you put in place an implementation such that it only starts testing when all services report that they are ready to handle traffic?

Regards Piotr Minkina

piotrminkina avatar Jan 24 '24 17:01 piotrminkina