helm-controller
helm-controller copied to clipboard
Helm upgrade failed suddenly. Start showing it has not deployed release and hr reconcile failed
When doing reconciliation of helm release, sometimes it starts showing "Helm upgrade failed: project has no deployed releases." It's happening once in 2-3 days.
Q: Is helm controller restarted? A: No
kubectl -n flux-system get pods helm-controller-6b456768d5-mcwcc
NAME READY STATUS RESTARTS AGE
helm-controller-6b456768d5-mcwcc 1/1 Running 0 2d5h
project = fe-stack Error:
flux get hr fe-stack -n qa-team
NAME READY MESSAGE REVISION SUSPENDED
fe-stack False Helm upgrade failed: "fe-stack" has no deployed releases False
kubectl describe helmreleases.helm.toolkit.fluxcd.io fe-stack -n qa-team
Last Helm logs:
preparing upgrade for fe-stack
resetting values to the chart's original version
performing update for fe-stack
creating upgraded release for fe-stack
Reason: UpgradeFailed
Status: False
Type: Released
Failures: 1
Helm Chart: qa-team/qa-team-fe-stack
Last Attempted Revision: v5.0.27
Last Attempted Values Checksum: 6a7c1ebdbb475c995c093799e1f3824926229e71
Last Handled Reconcile At: 2021-05-07T17:54:54.795085788+05:30
Observed Generation: 22
Upgrade Failures: 1
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal error 19m (x19 over 120m) helm-controller reconciliation failed: Helm upgrade failed: "fe-stack" has no deployed releases
Normal info 9m53s (x30 over 29h) helm-controller Helm upgrade has started
Normal error 9m47s (x21 over 120m) helm-controller Helm upgrade failed: "fe-stack" has no deployed releases
Logs with debug level: hr-controller.log
Hi @stefanprodan this is a production blocker for us to go live with fluxV2. I can also contribute if that helps.
Thank You
I suspect this to be related to / another version of #149, with a rich history of helm
users themselves running into it:
https://github.com/helm/helm/issues/5595 https://github.com/helm/helm/issues/7160
Judging on the shared logs, there seems to be a correlation with the --wait
behavior from Helm, that seems to corrupt the storage in some edge cases.
It's happening a lot of time during development. Is there are any workaround without deleting the helmrelease?
Its a manual activity but this helped the stuck release: https://github.com/helm/helm/issues/5595#issuecomment-717024123