helm-controller
helm-controller copied to clipboard
Changing targetNamespace causes orphaned HelmRelease
Describe the bug
If after deploying a HelmRelease
object the spec.targetNamespace
is later changed, the flux controller will create a new deployment, etc in the newly specified namespace. However, it fails to delete the old components. If the HelmRelease
is later deleted, Flux orphans the original deployment and does not delete it.
To Reproduce
Steps to reproduce the behaviour:
Install a HelmRelease
:
---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: myapp
namespace: flux-system
spec:
chart:
spec:
chart: myapp
sourceRef:
kind: HelmRepository
name: myrepo
version: 0.1.0
interval: 1m0s
targetNamespace: mynamespace
Commit the config to git and let Flux deploy.
Later, change the spec.targetNamespace
value:
---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: myapp
namespace: flux-system
spec:
chart:
spec:
chart: myapp
sourceRef:
kind: HelmRepository
name: myrepo
version: 0.1.0
interval: 1m0s
targetNamespace: mynewnamespace
Commit to git or apply ad-hoc to the cluster.
Expected behavior
Flux should create the deployment components in the new namespace, but it should also delete the components from the old namespace.
Additional context
I first hit this with v0.6.1, but I upgraded to v0.9.0 and the issue still exists.
- Kubernetes version: 1.17.15
- Git provider: Github
- Container registry provider: GCR
Below please provide the output of the following commands:
flux --version
flux check
kubectl -n <namespace> get all
kubectl -n <namespace> logs deploy/source-controller
kubectl -n <namespace> logs deploy/kustomize-controller
% flux --version
flux version 0.9.0
% flux check
► checking prerequisites
✔ kubectl 1.19.3 >=1.18.0-0
✔ Kubernetes 1.17.15-gke.800 >=1.16.0-0
► checking controllers
✔ helm-controller: healthy
► ghcr.io/fluxcd/helm-controller:v0.8.0
✔ kustomize-controller: healthy
► ghcr.io/fluxcd/kustomize-controller:v0.9.1
✔ notification-controller: healthy
► ghcr.io/fluxcd/notification-controller:v0.9.0
✔ source-controller: healthy
► ghcr.io/fluxcd/source-controller:v0.9.0
✔ all checks passed
Do you expect the garbage collection to happen before or after the install in the other namespace? I can imagine the effect can be quite different for some charts.
Ideally it would create the new resources in the new namespace then delete the ok'd resources. That is only because we presume that the services need to stay available. But I agree that could cause unintended consequences, and presumably if you're changing namespaces you understand that might cause a service disruption. I think the safe way might be to delete the old chart first, then install the new one.
The way this would work is that the <namespace>/<releaseName>
would be recorded, and then compared to the config in the HelmRelease
every reconciliation. This means that the feature does not just work for namespace changes, but also e.g. releaseName
changes.
Given this, I am concerned about the impact it may have when people accidentally edit the wrong fields. It would therefore likely require some flag to be set before the behavior is triggered.
Another concern I have is with replacing a release in the same namespace, if e.g. there is some secret name configured in the spec.values
it would likely not be possible to install the release before truncating the other one, as this would result in a rendered manifests contain a resource that already exists
error.
This issue was raised in today's Flux 🐛 scrub. Since there's not an obvious answer to how to handle this, I created a discussion from this issue: https://github.com/fluxcd/flux2/discussions/1691
@mbrancato @hiddeco would you be willing to summarize your thoughts there, and we will also ask other end users for additional feedback?
To tackle the change in the targetNamespace
field, my opinion is that the best option would be to recognize this change, create the new resources, and then delete the old resources once successfully changed. This only works for all namespaced resources. When it comes to global resources like CRDs, PV, Ingress, etc. - if these are present, they could be left in place. But if there is an unresolved conflict, Flux/helm-controller could always fail and explain to the user they must first delete the existing deployment. Delete before create would also work as an initial support for this prior to doing much inspection of the deployment to identify namespaced vs non-namespaced resources.
It may also be worthwhile to add a field that lets the deployment be configured to delete before create when required. If this field is set, Flux/helm-controller could go ahead and delete the old version and install the new version.