helm-controller icon indicating copy to clipboard operation
helm-controller copied to clipboard

Error notifications despite the resource being successfully reconciled

Open Diaoul opened this issue 3 years ago • 5 comments

This issue was first opened at https://github.com/fluxcd/flux/issues/3480

Describe the bug

Flux sends out error level notifications despite the resource being successfully reconciled. This is the discord notification I received, in that order:

[info] helmrelease/jellyfin.media
Helm upgrade has started
revision
7.3.2
[info] helmrelease/jellyfin.media
Helm upgrade succeeded
revision
7.3.2
[error] helmrelease/jellyfin.media
reconciliation failed: Operation cannot be fulfilled on helmreleases.helm.toolkit.fluxcd.io "jellyfin": the object has been modified; please apply your changes to the latest version and try again
revision
7.3.2

And when I checked later:

$ flux get helmrelease -n media jellyfin
NAME    	READY	MESSAGE                         	REVISION	SUSPENDED
jellyfin	True 	Release reconciliation succeeded	7.3.2   	False 

All of this happened in a 2 minutes time window between the start of the reconciliation and the error notification.

To Reproduce

Hard to tell. No manual intervention was made besides updating the docker image in the values of the chart on the gitops repository, all those resources are managed by flux. Last time jellyfin was reconciled it worked fine. A week ago grafana reconciliation had the same error but not after so it does not seem to be related to a helm chart in particular. My guess is that there is a conflict because flux tries to run two reconciliations at the same time of the same resource.

Expected behavior

Error notifications sent when reconciliation actually fails, maybe for a longer period of time? At least make this maybe a warning level on first occurrence. I am not sure what should be done, but throwing an error seems wrong.

Diaoul avatar May 05 '21 18:05 Diaoul

Please post here the output of flux check

stefanprodan avatar May 06 '21 07:05 stefanprodan

Sure

► checking prerequisites
✔ kubectl 1.21.0 >=1.18.0-0
✔ Kubernetes 1.20.6+k3s1 >=1.16.0-0
► checking controllers
✔ kustomize-controller: deployment ready
► ghcr.io/fluxcd/kustomize-controller:v0.12.0
✔ helm-controller: deployment ready
► ghcr.io/fluxcd/helm-controller:v0.10.0
✔ notification-controller: deployment ready
► ghcr.io/fluxcd/notification-controller:v0.13.0
✔ source-controller: deployment ready
► ghcr.io/fluxcd/source-controller:v0.12.1
✔ all checks passed

Diaoul avatar May 06 '21 08:05 Diaoul

Any update on this? The same happens here. In the chronological order: image

prerequisites:

  • Developer pushes a piece of code
  • CI system test/builds and pushes new tagged image to the ECR
  • According to the configured ImageRepository and ImagePolicy a new tag is detected
  1. Image Update Automation controller commit a new tag to the Git repo
  2. Source Controller updates repo ..somewhere inside Flux Pod
  3. 3.2. and 3.1. came to the same moment, thus probably Kustomization controller was the first who updated a tag value in HelmRelease and sent the event to the Slack channel, then Helm Controller started upgrade process
  4. A new image was successfully deployed.
  5. Failed ...why? what went wrong? kubectl get -o yaml helmrelease ... image

I'd appreciate any help or idea how to debug that issue

mikalai-t avatar Aug 25 '21 14:08 mikalai-t

I suspect it might be caused by two reconciliation happening at the same time: one set by HelmRelease's interval, second triggered by source update. I've seen the same issue but only from time to time, usually for me it's bunch of releases updated properly and one or two: first success then "object has been modified" error. Though I haven't seen such errors from Kustiomization, so maybe helm-controller treats already running reconciliation somewhat differently then kustomize-controller?

Something similar is described in https://github.com/fluxcd/flux2/issues/1882 could they be connected?

tbondarchuk avatar Oct 08 '21 14:10 tbondarchuk

It seems to be still happening. At least It happened in 1/2 of our clusters. Upgrade went through fine as described in this issue

helmrelease/sde.sde
Helm upgrade has started
revision
2022.2.0-external2

helmrelease/sde.sde
Helm upgrade succeeded
revision
2022.2.0-external2

helmrelease/sdesde
reconciliation failed: Operation cannot be fulfilled on [helmreleases.helm.toolkit.fluxcd.io](http://helmreleases.helm.toolkit.fluxcd.io/) "sde": the object has been modified; please apply your changes to the latest version and try again
revision
2022.2.0-external2

helmrelease/sde.sde
reconciliation failed: Operation cannot be fulfilled on [helmreleases.helm.toolkit.fluxcd.io](http://helmreleases.helm.toolkit.fluxcd.io/) "sde": the object has been modified; please apply your changes to the latest version and try again
revision
2022.2.0-external2

kustomization/helmcharts.flux-system
Health check passed in 20.121041632s
revision
main/7701d7768535a34ca4b53df88d822f65beecb4ed

jprecuch avatar Mar 21 '22 10:03 jprecuch