helm-controller icon indicating copy to clipboard operation
helm-controller copied to clipboard

Image Update Automation cause Helm Release to fail

Open scubakiz opened this issue 1 year ago • 1 comments

Describe the bug

When a new image is created, the Image Update Automation updates the HelmRelease in the source. This breaks the HelmRelease when the Helm Controller tries to update it.

{"level":"error","ts":"2022-07-06T06:38:33.774Z","logger":"controller.helmrelease","msg":"unable to update status after state update","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"atlas-helm-release","namespace":"flux-system","error":"Operation cannot be fulfilled on helmreleases.helm.toolkit.fluxcd.io "atlas-helm-release": the object has been modified; please apply your changes to the latest version and try again"}

I don't know if the fact that the HelmRelease has multiple tags in it is breaking it:

apiVersion: helm.toolkit.fluxcd.io/v2beta1 kind: HelmRelease metadata: name: atlas-helm-release namespace: flux-system spec: interval: 4m chart: spec: chart: ./Helm/atlas .... test: enable: false values: tags: imagedeleteworker: "1" # {"$imagepolicy": "flux-system:imagedeleteworker-image-policy:tag"} imageresizeworker: "3" # {"$imagepolicy": "flux-system:imageresizeworker-image-policy:tag"}
sampleworker: "2" # {"$imagepolicy": "flux-system:sampleworker-image-policy:tag"}

All 3 images were replaced within the same reconcile cycle. Since different teams are working on these, this can happen often.

Also, the HelmRelease eventually "works", in that the image versions are updated in the Deployments and the Pods are replaced with the new images. However, the HelmRelease error never goes away and Slack is full of error messages with timestamps 10 minutes apart.

Steps to reproduce

Follow the directions to create a HelmRelease that uses a GitRepository source and an ImageRepository, ImagePolicy and ImageUpdateAutomation. These 3 work fine and update the Git repo. Then the HelmRelease stops working because it gets confused by the update.

Expected behavior

Should be able to update multiple image versions during the same reconcile cycle without confusing Git.

Screenshots and recordings

{"level":"error","ts":"2022-07-06T06:33:17.343Z","logger":"controller.helmrelease","msg":"unable to update status after reconciliation","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"atlas-helm-release","namespace":"flux-system","error":"Operation cannot be fulfilled on helmreleases.helm.toolkit.fluxcd.io "atlas-helm-release": the object has been modified; please apply your changes to the latest version and try again"} {"level":"error","ts":"2022-07-06T06:33:17.343Z","logger":"controller.helmrelease","msg":"Reconciler error","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"atlas-helm-release","namespace":"flux-system","error":"Operation cannot be fulfilled on helmreleases.helm.toolkit.fluxcd.io "atlas-helm-release": the object has been modified; please apply your changes to the latest version and try again"} {"level":"info","ts":"2022-07-06T06:38:33.719Z","logger":"controller.helmrelease","msg":"reconcilation finished in 5m16.375283203s, next run in 4m0s","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"atlas-helm-release","namespace":"flux-system"} {"level":"error","ts":"2022-07-06T06:38:33.719Z","logger":"controller.helmrelease","msg":"Reconciler error","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"atlas-helm-release","namespace":"flux-system","error":"Helm upgrade failed: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline"} {"level":"error","ts":"2022-07-06T06:38:33.774Z","logger":"controller.helmrelease","msg":"unable to update status after state update","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"atlas-helm-release","namespace":"flux-system","error":"Operation cannot be fulfilled on helmreleases.helm.toolkit.fluxcd.io "atlas-helm-release": the object has been modified; please apply your changes to the latest version and try again"} {"level":"error","ts":"2022-07-06T06:38:33.785Z","logger":"controller.helmrelease","msg":"unable to update status after reconciliation","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"atlas-helm-release","namespace":"flux-system","error":"Operation cannot be fulfilled on helmreleases.helm.toolkit.fluxcd.io "atlas-helm-release": the object has been modified; please apply your changes to the latest version and try again"} {"level":"error","ts":"2022-07-06T06:38:33.786Z","logger":"controller.helmrelease","msg":"Reconciler error","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"atlas-helm-release","namespace":"flux-system","error":"Operation cannot be fulfilled on helmreleases.helm.toolkit.fluxcd.io "atlas-helm-release": the object has been modified; please apply your changes to the latest version and try again"}

OS / Distro

N/A

Flux version

v0.31.1

Flux check

► checking prerequisites ✗ flux 0.31.1 <0.31.3 (new version is available, please upgrade) ✔ Kubernetes 1.21.9 >=1.20.6-0 ► checking controllers ✔ helm-controller: deployment ready ► ghcr.io/fluxcd/helm-controller:v0.22.1 ✔ image-automation-controller: deployment ready ► ghcr.io/fluxcd/image-automation-controller:v0.23.2 ✔ image-reflector-controller: deployment ready ► ghcr.io/fluxcd/image-reflector-controller:v0.19.1 ✔ kustomize-controller: deployment ready ► ghcr.io/fluxcd/kustomize-controller:v0.26.1 ✔ notification-controller: deployment ready ► ghcr.io/fluxcd/notification-controller:v0.24.0 ✔ source-controller: deployment ready ► ghcr.io/fluxcd/source-controller:v0.25.5 ✔ all checks passed

Git provider

GitHub

Container Registry provider

Azure Container Registry

Additional context

By the way, it would be nice if the ImageUpdateAutomation checks/updates a ConfigMap with a values.yaml file instead of only scanning the HelmReleases.

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

scubakiz avatar Jul 06 '22 06:07 scubakiz

The error "the object has been modified; please apply your changes to the latest version and try again" has nothing to do with image automation, the helm-controller runs into race conditions and fails to update the status subresource, the update will be retried automatically.

By the way, it would be nice if the ImageUpdateAutomation checks/updates a ConfigMap with a values.yaml file instead of only scanning the HelmReleases.

Flux can only update YAML which are Kubernetes objects, any YAML file that has apiVersion and kind will work.

stefanprodan avatar Jul 15 '22 10:07 stefanprodan