source-controller icon indicating copy to clipboard operation
source-controller copied to clipboard

`HelmRelease` objects are temporarily initialised with an `ArtifactFailed` condition

Open alvarosanchez opened this issue 4 years ago • 5 comments

We're using Flux for installing Helm charts, and we have are experiencing a situation that looks suspicious. Suppose the following HelmRelease (the HelmRepository and Secret have already been created in a previous step):

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata: 
  name: redis
  namespace: my-namespace
spec: 
  chart: 
    spec: 
      chart: redis
      interval: 1m0s
      sourceRef: 
        kind: HelmRepository
        name: harbor
      version: "14.6.6"
  install: 
    remediation: 
      retries: 3
  interval: 5m0s
  test: 
    enable: true
  valuesFrom: 
    - kind: Secret
      name: redis-values

When this object is created, within the same second (this is the precision of the k8s timestamps), we observe the following:

  • The object is created. No status conditions yet.
  • The following status condition is added: message: Reconciliation in progress, reason: Progressing, status: Unknown, type: Ready.
  • The above condition is removed, and this one is added: message: HelmChart 'xxx' is not ready, reason: ArtifactFailed, status: "False", type: Ready. In this change, failures: 1.
  • failures count is incremented to 2.

This all happens within the same second. All the conditions timestamps are the same.

Then, a few seconds later, the failed condition is removed, and this one is added: message: Reconciliation in progress, reason: Progressing, status: Unknown, type: Ready.

About a minute later, chart installation and reconciliation finishes. The previous condition is removed, and the following are added:

- lastTransitionTime: "2021-07-15T15:45:29Z"
  message: Release reconciliation succeeded
  reason: ReconciliationSucceeded
  status: "True"
  type: Ready
- lastTransitionTime: "2021-07-15T15:45:26Z"
  message: Helm install succeeded
  reason: InstallSucceeded
  status: "True"
  type: Released
- lastTransitionTime: "2021-07-15T15:45:29Z"
  message: Helm test succeeded
  reason: TestSucceeded
  status: "True"
  type: TestSuccess

Why is this happening? In my opinion, the progressing condition should be the only one until it either reconciles successfully or actually fails.

alvarosanchez avatar Aug 20 '21 16:08 alvarosanchez

@alvarosanchez would you mind posting the events of the HelmRepository and and respective HelmChart here, too?

makkes avatar Aug 31 '21 09:08 makkes

Using this source:

apiVersion: source.toolkit.fluxcd.io/v1beta1
kind: HelmRepository
metadata:
  name: bitnami
spec:
  url: https://charts.bitnami.com/bitnami
  interval: 10m
---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: wordpress
spec:
  interval: 5m
  chart:
    spec:
      chart: wordpress
      version: "11.0.11"
      sourceRef:
        kind: HelmRepository
        name: bitnami
      interval: 1m
  install:
    remediation:
      retries: 3
  test:
    enable: true  

Attaching events data:

helmrepositories.txt helmcharts.txt helmreleases.txt

alvarosanchez avatar Aug 31 '21 16:08 alvarosanchez

@hiddeco can we guard against this race condition in the HelmChartReconciler rewrite?

stefanprodan avatar Sep 01 '21 10:09 stefanprodan

I do not think this race condition comes from the HelmChartReconciler logic, but rather from the logic with the helm-controller which resets all conditions as "progressing" as soon as it starts a new reconciliation run.

The proper solution I think, would be to not reset the whole condition state at the beginning of the reconciliation, but instead ensure the Reconciling condition type is present on the HelmRelease resource for as long as it takes for the HelmChart to become ready and the first installation to finish.

In case of an installation error, or if the HelmChart ends up in a finite failed state, the HelmRelease would then be marked as Stalled.

hiddeco avatar Sep 02 '21 14:09 hiddeco

So is this issue acknowledged? And if so, any chances to get it fixed?

alvarosanchez avatar Oct 27 '21 20:10 alvarosanchez