helm-controller
helm-controller copied to clipboard
Unable to delete namespace for HelmReleases impersonating ServiceAccount
When:
- A HelmRelease sets .spec.serviceAccountName to "default"
- A Namespace is set for termination
- The ServiceAccount is terminated before the HelmRelease automatically
- The HelmController attempts to delete the HelmRelease
- The Namespace never terminates
{"level":"error","ts":"2021-06-01T01:30:41.625Z","logger":"controller.helmrelease","msg":"Reconciler error","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"nginx-hello","namespace":"gatekeeper-test-296eaaf2","error":"could not impersonate ServiceAccount 'default': ServiceAccount \"default\" not found"}
{"level":"error","ts":"2021-06-01T01:31:22.901Z","logger":"controller.helmrelease","msg":"Reconciler error","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"nginx-hello","namespace":"gatekeeper-test-a2852804","error":"could not impersonate ServiceAccount 'default': ServiceAccount \"default\" not found"}
{"level":"error","ts":"2021-06-01T01:32:59.594Z","logger":"controller.helmrelease","msg":"Reconciler error","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"nginx-hello","namespace":"gatekeeper-test-d4cc7ecb","error":"could not impersonate ServiceAccount 'default': ServiceAccount \"default\" not found"}
{"level":"error","ts":"2021-06-01T01:36:09.305Z","logger":"controller.helmrelease","msg":"Reconciler error","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"nginx-hello","namespace":"gatekeeper-test-296eaaf2","error":"could not impersonate ServiceAccount 'default': ServiceAccount \"default\" not found"}
Hypothesis: The Helm Controller is behaving as expected by attempting to impersonate the SA, and failing the operation because the SA does not exist. This behaviour is unexpected as a a termination of a namespace should result in the deletion of helm resources as well.
Alternatives:
- Impersonate only for creation of resources, and not for deletion of resources
- If a SA cannot be found, detect if a namespace is marked for deletion, and delete the helm resources
Impersonate only for creation of resources, and not for deletion of resources
This would completely defeat the purpose of allowing a service account to be defined, as the idea is that this provides fine grain configuration over the RBAC scope for the HelmRelease.
If a SA cannot be found, detect if a namespace is marked for deletion, and delete the helm resources
This would be better, but still open some room to fiddle around with e.g. the DeletionTimestamp to try to trick the controller into doing something it shouldn't.
I think the better solution would be to add our finalizer to the ServiceAccount, so that it is not terminated before the HelmRelease, but only after we have performed a clean uninstall and removed our finalizer. The tricky part here is de-registration when a user changes the service account configuration, as keeping it around would result in the service account never terminating.
@hiddeco we can't add our finalizer to the ServiceAccount as it will break the new impersonation, Kubernetes Users are not objects so there is nothing to finalize.
To add a bit more context: we are in the process of reshaping our security model (details here: https://github.com/fluxcd/flux2/pull/582, first PoC: https://github.com/fluxcd/kustomize-controller/pull/349). With this model, the service account does not actually exist, but is just a name/group that exists as a RoleBinding.
Stefan's point is that we should not make any more changes to the current impersonation features as they stand, as there is a high chance of breaking changes in the (near) future.
In this example, the HelmRelease is dependent on some other in-namespace resources. (default SA) The issue is that an entire Namespace is being deleted before helm-controller has the ability to use the dependency.
It is probably important to note how or why the namespace is being deleted. There are a few use-cases.
- A Namespace is set for termination
The Namespace/defaultSA and dependent HelmRelease can be easily applied to the cluster using Flux's dependency features. They cannot be removed from the cluster in the proper order in an atomic way. This is a limitation with a related issue: https://github.com/fluxcd/kustomize-controller/issues/301
Solving that issue can help, but if you are just manually deleting the Namespace without respect for ordered, in-namespace dependency, the only things I can imagine that would help is Finalizers on the parent objects or an Admission Control Webhook that prevents deletes using some object DAG.
Regarding the missing SA in the new implementation, you would likely want to block on the existence of the pertinent RoleBindings.
WRT SA's, finalizing the SA is a good start but it's not enough if the necessary RoleBindings and Roles are deleted. Those should be considered part of the same lifecycle as the principal itslef.
It's possible to protect resources from deletion using Finalizers.
Perhaps helm-controller could look up RoleBindings pertinent to its impersonated principal and attempt to de-register any matching finalizers? ( ex: helm.toolkit.fluxcd.io/hr-namespace/hr-name )
It's hard to think of a U/X that makes this work well and is also performant. Perhaps helm-controller has an additional informer on RoleBindings and the behavior is opt-in via RoleBinding annotation(s).
Perhaps the Deletion Admission Controller or Webhook is an easier solution to reason about? Some admission controllers combine Finalizers into their implementation: https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#storageobjectinuseprotection
It's worth reading this comment if you're parsing this thread: https://github.com/fluxcd/kustomize-controller/issues/301#issuecomment-871597112