helm-operator-plugins icon indicating copy to clipboard operation
helm-operator-plugins copied to clipboard

Apply updates from operator-sdk helm-operator

Open joelanford opened this issue 5 years ago • 12 comments

  • [x] ~Uninstall bug: https://github.com/operator-framework/operator-sdk/pull/3431~
  • [x] Build info metric: https://github.com/operator-framework/operator-sdk/pull/4220
    • Done in #67
  • [x] Plugin docker invocation: https://github.com/operator-framework/operator-sdk/pull/4226
    • Done in #68
  • [x] Don't inject owner references for helm keep resources: https://github.com/operator-framework/operator-sdk/pull/4389
    • Done in #71
  • [x] operator-framework/operator-sdk#4358 when all non-deployed release records are purged, it appears that the helm app was never installed. To recover, the helm operator performed an install but failed so it did an uninstall as rollback which lead to data loss. Related to operator-framework/operator-sdk#4297
    • Done in #81
  • [x] ~Prevent nil pointer during uninstall: https://github.com/operator-framework/operator-sdk/pull/4288~
    • Not applicable. See https://github.com/joelanford/helm-operator/issues/63#issuecomment-767068736
  • [ ] operator-framework/operator-sdk#4347 in a more busy env like production, the helm operator delete doesn't cleanup all the resources. This PR is to hopefully improve on that.
  • [ ] operator-framework/operator-sdk#4297 - skip rollback/uninstall if status.DeployedRelease is set, but install/upgrade failed
  • [ ] Improve help text: https://github.com/operator-framework/operator-sdk/pull/4187
  • [ ] Possible: improve handling of install errors: https://github.com/operator-framework/operator-sdk/pull/4297 (still unmerged in SDK repo as of 1/12/2021)

joelanford avatar Nov 12 '20 21:11 joelanford

~PR for Build info metric: 67~ PR to add correct help text: kb-1830 ~PR for Plugin docker invocation: 68~

anmol372 avatar Dec 16 '20 18:12 anmol372

FYI There is also this https://github.com/operator-framework/operator-sdk/pull/4288 that was merged

mikeshng avatar Jan 11 '21 22:01 mikeshng

If possible, could you please also consider https://github.com/operator-framework/operator-sdk/pull/4297 Due to the unintended uninstall rollback, some of our customers were hit with unintended data loss.

mikeshng avatar Jan 11 '21 22:01 mikeshng

Thanks @mikeshng if there are any other Helm-related PRs that you're aware of, please post them.

We're fairly confident that this repo has everything up to SDK v1.0.0.

Our goal is to eventually replace the SDK helm-operator implementation with this one, and we want to make sure all the bug fixes and added features from the SDK repo make it here.

joelanford avatar Jan 12 '21 14:01 joelanford

Thanks @joelanford for all your help. Two PRs that I am less confident about and I am looking for your team's expert opinions on:

  • https://github.com/operator-framework/operator-sdk/pull/4358 I think this is what some of my customers ran into: all the release records were purged making it seems like the helm app was never installed. To recover, the helm operator performed an install but failed so it did an uninstall as rollback which lead to data loss. Related to https://github.com/operator-framework/operator-sdk/pull/4297
  • https://github.com/operator-framework/operator-sdk/pull/4347 in a more busy env like production, the helm operator delete doesn't cleanup all the resources. This PR is to hopefully improve on that.

mikeshng avatar Jan 12 '21 14:01 mikeshng

For operator-framework/operator-sdk#4288, is this just a defensive check, or is there actually a code path that results in a nil error and a nil UninstallReleaseResponse or nil Release? I'm not seeing a way for the existing Helm libraries to return both a nil Release and a nil error.

joelanford avatar Jan 25 '21 01:01 joelanford

I don't think operator-framework/operator-sdk#4358 is applicable here. In this repo, there is nothing that deletes releases from the storage backend outside of running through an uninstall. manager.Sync has been mostly replaced by getReleaseState.

When a non-deployed release exists, it looks like this repo will attempt an upgrade rather than pre-deleting it and then attempting a clean install. If the last release is one of Deployed, Failed, or Superseded, then the upgrade will proceed. That seems reasonable in the context of the Helm operator too I think. I'll submit a PR here to make sure we attempt the upgrade in this scenario.

joelanford avatar Jan 25 '21 03:01 joelanford

For operator-framework/operator-sdk#4288, is this just a defensive check, or is there actually a code path that results in a nil error and a nil UninstallReleaseResponse or nil Release? I'm not seeing a way for the existing Helm libraries to return both a nil Release and a nil error.

As I remember, I ran into a nil pointer in the old code. It was only doing:

uninstallResponse, err := uninstall.Run(m.releaseName)
return uninstallResponse.Release, err

So the primary goal of the fix was to avoid this nilpointer when uninstallResponse is nil.

As for the change if log.V(0).Enabled() && uninstalledRelease != nil { Yes, I think that's a defensive check that Eric suggested.

mikeshng avatar Jan 25 '21 14:01 mikeshng

Re: operator-framework/operator-sdk#4288 - It looks like this is also not applicable in this repo since the client returns the uninstall response and then the caller correctly checks for errors before assuming the embedded release field is valid.

joelanford avatar Jan 25 '21 19:01 joelanford

@joelanford if you have some cycles, could you please see my idea here and post a feedback message: https://lists.cncf.io/g/cncf-helm/message/351 I am trying to enhance the helm uninstall so the helm operator can leverage the new functionality and perform a more reliable resource cleanup upon CR delete. Thanks.

mikeshng avatar Jan 27 '21 21:01 mikeshng

@varshaprasad96 please review to see where we are with this, close when done.

jmrodri avatar Nov 30 '21 16:11 jmrodri

Went through the updates with respect to helm-operator in operator-sdk after SDK 1.0 release till v1.15.0. These are the following updates:

  • [x] Add predicates for filtering : https://github.com/operator-framework/operator-sdk/pull/4997 - Taken care by #115
  • [x] ~~Add option to watch list types~~ : https://github.com/operator-framework/helm-operator-plugins/pull/138
  • [ ] Helm operator reconciliation diffs are now logged only at the zap debug level : https://github.com/operator-framework/operator-sdk/pull/5307/files - Discussed in #44
  • [x] ~~support go text/template evaluation in override values~~ : https://github.com/operator-framework/operator-sdk/pull/5105
  • [x] ~~fix sending empty patch requests to api server~~: https://github.com/operator-framework/operator-sdk/pull/4957/files (fixed by #139)
  • [ ] add annotation to wait for to wait for all resources to be deleted before removing CR finalizer : https://github.com/operator-framework/operator-sdk/pull/4487/files

PS: These are updates made to the library, the plugin was updated recently and is already upto date.

varshaprasad96 avatar Dec 06 '21 07:12 varshaprasad96