helm-operator-plugins
helm-operator-plugins copied to clipboard
Apply updates from operator-sdk helm-operator
- [x] ~Uninstall bug: https://github.com/operator-framework/operator-sdk/pull/3431~
- Not applicable. The uninstall code in this repo already directly calls
uninstall.Runwithout any pre-checks for the existence of the release.
- Not applicable. The uninstall code in this repo already directly calls
- [x] Build info metric: https://github.com/operator-framework/operator-sdk/pull/4220
- Done in #67
- [x] Plugin docker invocation: https://github.com/operator-framework/operator-sdk/pull/4226
- Done in #68
- [x] Don't inject owner references for helm
keepresources: https://github.com/operator-framework/operator-sdk/pull/4389- Done in #71
- [x] operator-framework/operator-sdk#4358 when all non-deployed release records are purged, it appears that the helm app was never installed. To recover, the helm operator performed an install but failed so it did an uninstall as rollback which lead to data loss. Related to operator-framework/operator-sdk#4297
- Done in #81
- [x] ~Prevent nil pointer during uninstall: https://github.com/operator-framework/operator-sdk/pull/4288~
- Not applicable. See https://github.com/joelanford/helm-operator/issues/63#issuecomment-767068736
- [ ] operator-framework/operator-sdk#4347 in a more busy env like production, the helm operator delete doesn't cleanup all the resources. This PR is to hopefully improve on that.
- [ ] operator-framework/operator-sdk#4297 - skip rollback/uninstall if status.DeployedRelease is set, but install/upgrade failed
- [ ] Improve help text: https://github.com/operator-framework/operator-sdk/pull/4187
- [ ] Possible: improve handling of install errors: https://github.com/operator-framework/operator-sdk/pull/4297 (still unmerged in SDK repo as of 1/12/2021)
~PR for Build info metric: 67~ PR to add correct help text: kb-1830 ~PR for Plugin docker invocation: 68~
FYI There is also this https://github.com/operator-framework/operator-sdk/pull/4288 that was merged
If possible, could you please also consider https://github.com/operator-framework/operator-sdk/pull/4297 Due to the unintended uninstall rollback, some of our customers were hit with unintended data loss.
Thanks @mikeshng if there are any other Helm-related PRs that you're aware of, please post them.
We're fairly confident that this repo has everything up to SDK v1.0.0.
Our goal is to eventually replace the SDK helm-operator implementation with this one, and we want to make sure all the bug fixes and added features from the SDK repo make it here.
Thanks @joelanford for all your help. Two PRs that I am less confident about and I am looking for your team's expert opinions on:
- https://github.com/operator-framework/operator-sdk/pull/4358 I think this is what some of my customers ran into: all the release records were purged making it seems like the helm app was never installed. To recover, the helm operator performed an install but failed so it did an uninstall as rollback which lead to data loss. Related to https://github.com/operator-framework/operator-sdk/pull/4297
- https://github.com/operator-framework/operator-sdk/pull/4347 in a more busy env like production, the helm operator delete doesn't cleanup all the resources. This PR is to hopefully improve on that.
For operator-framework/operator-sdk#4288, is this just a defensive check, or is there actually a code path that results in a nil error and a nil UninstallReleaseResponse or nil Release? I'm not seeing a way for the existing Helm libraries to return both a nil Release and a nil error.
I don't think operator-framework/operator-sdk#4358 is applicable here. In this repo, there is nothing that deletes releases from the storage backend outside of running through an uninstall. manager.Sync has been mostly replaced by getReleaseState.
When a non-deployed release exists, it looks like this repo will attempt an upgrade rather than pre-deleting it and then attempting a clean install. If the last release is one of Deployed, Failed, or Superseded, then the upgrade will proceed. That seems reasonable in the context of the Helm operator too I think. I'll submit a PR here to make sure we attempt the upgrade in this scenario.
For operator-framework/operator-sdk#4288, is this just a defensive check, or is there actually a code path that results in a nil error and a nil
UninstallReleaseResponseor nilRelease? I'm not seeing a way for the existing Helm libraries to return both a nilReleaseand a nil error.
As I remember, I ran into a nil pointer in the old code. It was only doing:
uninstallResponse, err := uninstall.Run(m.releaseName)
return uninstallResponse.Release, err
So the primary goal of the fix was to avoid this nilpointer when uninstallResponse is nil.
As for the change if log.V(0).Enabled() && uninstalledRelease != nil { Yes, I think that's a defensive check that Eric suggested.
Re: operator-framework/operator-sdk#4288 - It looks like this is also not applicable in this repo since the client returns the uninstall response and then the caller correctly checks for errors before assuming the embedded release field is valid.
@joelanford if you have some cycles, could you please see my idea here and post a feedback message: https://lists.cncf.io/g/cncf-helm/message/351
I am trying to enhance the helm uninstall so the helm operator can leverage the new functionality and perform a more reliable resource cleanup upon CR delete. Thanks.
@varshaprasad96 please review to see where we are with this, close when done.
Went through the updates with respect to helm-operator in operator-sdk after SDK 1.0 release till v1.15.0. These are the following updates:
- [x] Add predicates for filtering : https://github.com/operator-framework/operator-sdk/pull/4997 - Taken care by #115
- [x] ~~Add option to watch list types~~ : https://github.com/operator-framework/helm-operator-plugins/pull/138
- [ ] Helm operator reconciliation diffs are now logged only at the zap debug level : https://github.com/operator-framework/operator-sdk/pull/5307/files - Discussed in #44
- [x] ~~support go text/template evaluation in override values~~ : https://github.com/operator-framework/operator-sdk/pull/5105
- [x] ~~fix sending empty patch requests to api server~~: https://github.com/operator-framework/operator-sdk/pull/4957/files (fixed by #139)
- [ ] add annotation to wait for to wait for all resources to be deleted before removing CR finalizer : https://github.com/operator-framework/operator-sdk/pull/4487/files
PS: These are updates made to the library, the plugin was updated recently and is already upto date.