operator-sdk
operator-sdk copied to clipboard
Helm release disappears - operator unable to uninstall release - "Release not found"
Bug Report
This issue happens with a Helm operator. We have been seeing it recently, which makes us think of a regression from either operator-sdk 1.33.0 or GKE 1.27.
What did you do?
- Create a custom resource as input to the operator with ArgoCD
- Wait for the operator to install the Helm release
- At this point, the helm release is visible and the corresponding secret present in the namespace.
- Delete the resource using ArgoCD
What did you expect to see?
- The Helm release should not disappear until all resources have been removed.
- After the CR has been removed, the operator should be able to properly remove the resources
What did you see instead? Under which circumstances?
- The Helm release disappears (
helm listandkubectl get secretsboth stop showing the release) - The operator displays the following:
{"level":"info","ts":"2024-01-26T13:18:58Z","logger":"helm.controller","msg":"Release not found","namespace":"namespaces","name":"zoom-s001","apiVersion":"charts.symphony.com/v1alpha1","kind":"ExtendedNamespace","release":"zoom-s001"}
{"level":"info","ts":"2024-01-26T13:18:58Z","logger":"helm.controller","msg":"Removing finalizer","namespace":"namespaces","name":"zoom-s001","apiVersion":"charts.symphony.com/v1alpha1","kind":"ExtendedNamespace","release":"zoom-s001"}
- The resources deployed by the chart are still there, even though the CR has been fully deleted
- At this point we need to manually remove the leftovers.
Environment
Operator type:
Kubernetes cluster type:
GKE
$ operator-sdk version
1.33.0
$ go version (if language is Go)
1.21
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.10", GitCommit:"0fa26aea1d5c21516b0d96fea95a77d8d429912e", GitTreeState:"clean", BuildDate:"2024-01-17T13:46:28Z", GoVersion:"go1.20.13", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.7-gke.1121000", GitCommit:"4daab1fd78c0b9aba478a19b363ab4a25bdadd79", GitTreeState:"clean", BuildDate:"2023-11-06T09:24:38Z", GoVersion:"go1.20.10 X:boringcrypto", Compiler:"gc", Platform:"linux/amd64"}
Possible Solution
As a way to mitigate the impact: don't remove the finalizer in this case.
Additional context
Upgrade to GKE 1.27 (from 1.26) was done recently
I found a workaround using the uninstall-wait annotation described here: https://sdk.operatorframework.io/docs/building-operators/helm/reference/advanced_features/annotations/#helmsdkoperatorframeworkiouninstall-wait
This helps mitigate the impact, although the issue probably will remain.
Could you post what your CRs look like, both the operand and the ArgoCD one? It it created with finalizers on the dependent resources or owner references? It's a bit hard to follow what exactly is happening from the information you've posted.
Hey we're going to close this due to inactivity. If you're still experiencing this issue please reopen it with the requested information so we can take a look at it.
Thanks!