flux2
flux2 copied to clipboard
Error while running post render on files when using HelmRelease with local alias dependencies
Describe the bug
When using a HelmRelease with Chart downloaded from Git Source, that have local alias dependencies we are experiencing strange behaviour. The outcome error is helm-controller reconciliation failed: Helm install failed: error while running post render on files: may not add resource with an already registered id:.
We could not find the root cause of the issue. While helm install and helm template then kubectl apply works perfectly, Flux cant make it work. There is some config that helps mitigating it, but that is a very wide workaround:
- Adding
{{- if .Values.enabled }}statement to each resource in template.Note: It cannot be a simple
if true- the value must be evaluated using external values - Adding the "enabled" value to the subchart
values.yaml, set tofalse. - Aliasing the subchart and adding the "enabled" value to the parent Helm chart, once again must be
false. - Overriding the "enabled" value in a flux HelmRelease values.
The "enabled" value can only be overridden (i.e., set to true) in the HelmRelease step. If this value is enabled at any step closer to the subchart, it will trigger the error.
Steps to reproduce
- Create a HelmChart that have local dependencies referenced using aliases:
dependencies:
- alias: mychart1
name: mysubchart
version: ">0.0.0"
- alias: mychart2
name: mysubchart
version: ">0.0.0"
- Deploy the chart to cluster using HelmRelease and GitRepository as source.
Expected behavior
The chart should be processed just like helm template or helm install does. There are no duplicates or resources with exact same id.
Screenshots and recordings
No response
OS / Distro
EKS
Flux version
v2.1.1
Flux check
flux check ► checking prerequisites ✗ flux 2.1.1 <2.1.2 (new version is available, please upgrade) ✔ Kubernetes 1.27.6-eks-f8587cb >=1.25.0-0 ► checking controllers ✔ helm-controller: deployment ready ► ghcr.io/fluxcd/helm-controller:v0.35.0 ✔ image-automation-controller: deployment ready ► ghcr.io/fluxcd/image-automation-controller:v0.35.0 ✔ image-reflector-controller: deployment ready ► ghcr.io/fluxcd/image-reflector-controller:v0.29.1 ✔ kustomize-controller: deployment ready ► ghcr.io/fluxcd/kustomize-controller:v1.0.1 ✔ notification-controller: deployment ready ► ghcr.io/fluxcd/notification-controller:v1.0.0 ✔ source-controller: deployment ready ► ghcr.io/fluxcd/source-controller:v1.0.1 ► checking crds ✔ alerts.notification.toolkit.fluxcd.io/v1beta2 ✔ buckets.source.toolkit.fluxcd.io/v1beta2 ✔ gitrepositories.source.toolkit.fluxcd.io/v1 ✔ helmcharts.source.toolkit.fluxcd.io/v1beta2 ✔ helmreleases.helm.toolkit.fluxcd.io/v2beta1 ✔ helmrepositories.source.toolkit.fluxcd.io/v1beta2 ✔ imagepolicies.image.toolkit.fluxcd.io/v1beta2 ✔ imagerepositories.image.toolkit.fluxcd.io/v1beta2 ✔ imageupdateautomations.image.toolkit.fluxcd.io/v1beta1 ✔ kustomizations.kustomize.toolkit.fluxcd.io/v1 ✔ ocirepositories.source.toolkit.fluxcd.io/v1beta2 ✔ providers.notification.toolkit.fluxcd.io/v1beta2 ✔ receivers.notification.toolkit.fluxcd.io/v1 ✔ all checks passed
Git provider
GitHub
Container Registry provider
No response
Additional context
No response
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
I suspect that something like helm template ... | yq '(.kind + "/" + .metadata.name)' | uniq -c will tell you there is at least one resource with a duplicate declaration.
I suspect that something like
helm template ... | yq '(.kind + "/" + .metadata.name)' | uniq -cwill tell you there is at least one resource with a duplicate declaration.
Nope, please dont make me repeat myself. I said There are no duplicates or resources with exact same id..
helm template gives output with no duplicates. Moreover helm install works perfectly.
Whats more following steps I described in Describe the bug section, without modyfing anything else makes it work.
To make sure there are not duplicates I went with the command you proposed. Here is the output:
1 PodDisruptionBudget/myservice-sidekiq-default
1 ---
1 PodDisruptionBudget/myservice-sidekiq-urgent
1 ---
1 PodDisruptionBudget/myservice-web
1 ---
1 Service/myservice-web
1 ---
1 Deployment/myservice-sidekiq-default
1 ---
1 Deployment/myservice-sidekiq-urgent
1 ---
1 Deployment/myservice-web
1 ---
1 CronJob/myservice-cronjob
1 ---
1 Ingress/myservice-web
1 ---
1 ExternalSecret/myservice-postgres-access
1 ---
1 ExternalSecret/myservice-rails-key
1 ---
1 ExternalSecret/myservice-redis-access
1 ---
1 ExternalSecret/metrics-password
1 ---
1 PrometheusRule/myservice-web
1 ---
1 ScaledObject/myservice-sidekiq-default
1 ---
1 ScaledObject/myservice-sidekiq-urgent
1 ---
1 ServiceMonitor/myservice-web
The reason I am asking this, is because historically any bug reported like this ended up being some mistake in the chart, and not a problem with the helm-controller.
The fact that helm template and helm install work a expected does not really change anything. Given the helm-controller uses Kustomize to inject labels to track the resources which are part of the HelmRelease, which makes it more strict in terms of parsing the YAML produced by the chart.
One other thing that comes to mind is that you may have some empty document defined in the final helm template output, e.g.
---
apiVersion: v1
...
---
# empty
---
apiVersion: v1
...
If that is not the case either, I can not provide more help without being able to reproduce the issue myself (using e.g. your chart, or the output from helm template).
Ok, so I created a repo for this purpose with sample source files.
Combining it together gives following error:
helm-controller reconciliation failed: Helm install failed: error while running post render on files: may not add resource with an already registered id: Pod.v1.[noGrp]/pod1.[noNs]
while its clear that there are only two resources and none of them is a duplicate:
helm template .
---
# Source: sample-flux-fail/charts/just-one-pod1/templates/pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod1
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
---
# Source: sample-flux-fail/charts/just-one-pod2/templates/pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod2
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
Are there any estimations for this bug?
Hi!
I have exactly the same issue, with the following flux versions:
$ flux version
flux: v2.3.0
distribution: flux-v2.3.0
helm-controller: v1.0.1
kustomize-controller: v1.3.0
notification-controller: v1.3.0
source-controller: v1.3.0
I have a chart using a subchart, and the subchart is instanciated multiple times with aliases - exactly like @kgreczka-iteo mentioned.
My gut feeling is that Flux generates unique IDs for every resource, but when a Helm chart has subcharts with aliases, it uses the subchart name instead of its alias when generating the ID, thus leading into a conflict.
I don't have a fix (yet) but I wanted to confirm that this wasn't an isolated issue :)
+1 on this issue. I have the same versions as @jpetazzo
@kgreczka-iteo thanks for the repo, I was able to reproduce the issue locally. The bug is (somewhere) in source-controller, the chart artifact produced after consuming the GitRepository source looks very different from the original, check the screenshots below.
Original chart in the git repo:
Output from helm template ...:
Click to expand YAML
---
# Source: sample-flux-fail/charts/just-one-pod1/templates/pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod1
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
---
# Source: sample-flux-fail/charts/just-one-pod2/templates/pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod2
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
Chart produced by source-controller after consuming the git repo:
Output from helm template ...:
Click to expand YAML
---
# Source: sample-flux-fail/charts/just-one-pod1/charts/just-one-pod1/templates/pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod1
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
---
# Source: sample-flux-fail/charts/just-one-pod1/charts/just-one-pod2/templates/pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod2
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
---
# Source: sample-flux-fail/charts/just-one-pod1/templates/pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod1
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
---
# Source: sample-flux-fail/charts/just-one-pod2/charts/just-one-pod1/templates/pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod1
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
---
# Source: sample-flux-fail/charts/just-one-pod2/charts/just-one-pod2/templates/pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod2
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
---
# Source: sample-flux-fail/charts/just-one-pod2/templates/pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod2
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
The pods are duplicated indeed
By the way, there is a small bug in the example @kgreczka-iteo, the name of the subchart is just-one-pod in the Chart.yaml but the path is just-the-pod. Helm requires those two to be equal: https://github.com/helm/helm/blob/588041f6a55a8f23113f3c080d1733e65176fa0a/pkg/downloader/manager.go#L275-L278
In any case, the source-controller bug still happens even after fixing this.
I found the issue, the chart name is not taken into account for resolving a local dependency:
https://github.com/fluxcd/source-controller/blob/8d8e7cc982cd51ca05dbb900bf7652dc04b1248f/internal/helm/chart/dependency_manager.go#L299-L306
We ran into exact the same issue yesterday, will report back, if this fixes it!