helm-controller icon indicating copy to clipboard operation
helm-controller copied to clipboard

Helm controller has incorrect behavior when dealing with duplicate environment variables

Open daniel-anova opened this issue 1 year ago • 8 comments

Helm controller allows creating a Helm Release with a duplicate environment variable, it deploys with just a warning about the duplication.

However, when removing one of the duplicate values Helm controller will remove all copies of that environment variable.

So if the deployment has:

    env:
    - name: ihave
      value: "apple"
    - name: ihave
      value: "pen"

and values are set to:

  values:
    env:
    - name: ihave
      value: "apple"

The resulting deployment env becomes:

    env: []

The only way to restore the deployment is to delete the manifest and force reconciliation or add an additional environment variable.

When trying to add duplicate values to an existing HelmRelease however the new duplicates are ignored.

    env:
    - name: ihave
      value: "apple"

and values are set to:

  values:
    env:
    - name: ihave
      value: "apple"
    - name: ihave
      value: "pen"

The resulting deployment env becomes:

    env:
    - name: ihave
      value: "apple"

The behavior should be consistent, helm controller should never allow the duplicate values or deal with them correctly on removal.

Test case Helm Release

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: bad-deployment
spec:
  releaseName: bad-deployment
  chart:
    spec:
      chart: bad-deployment
      sourceRef:
        kind: GitRepository
        name: bad-deployment
  interval: 5m
  install:
    remediation:
      retries: 3
  values:
    name: bad-deployment-test
    env:
    - name: ihave
      value: "apple"
    - name: ihave
      value: "pen"

Relevant part of Helm Chart template

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ .Values.name }}
  labels:
    app: {{ .Values.name }}
    release: {{ .Release.Name }}
spec:
  strategy:
    type: RollingUpdate
  selector:
    matchLabels:
      app: {{ .Values.name }}
  template:
    metadata:
      labels:
        app: {{ .Values.name }}
        release: {{ .Release.Name }}
    spec:
      containers:
      - name: {{ .Values.name }}
        image: "busybox:latest"
        command: ["/bin/sleep", "10d"]
        env:
          {{- if .Values.env}}
          {{- toYaml .Values.env | nindent 10 }}
          {{- end}}

daniel-anova avatar Sep 29 '22 16:09 daniel-anova

Could you let us know what version of Flux is installed?

I'm looking at this issue and it reminds me of an issue that affected Flux's "new" Server Side Apply when it was still very new, but it's been resolved for some time. It might not be the same issue, but in the interest of ruling it out quickly, what is the output of flux check (it includes the version information)

kingdonb avatar Oct 05 '22 13:10 kingdonb

Also, maybe this: was there any error when the HelmRelease change was not successfully applied? (It seems the changes were ignored, maybe it was an error that actually prevented the reconciliation from completing.)

Eg. at this point,

and values are set to:

  values:
    env:
    - name: ihave
      value: "apple"
The resulting deployment env becomes:

    env: []

it seems like the change perhaps hasn't actually been applied, can you check if there is any error reconciling the HelmRelease or parent Kustomization that might explain why this happened?

This could be a bug in Helm's three-way merge, or something else. Thank you for reporting this! I think you've given us enough information here to reproduce it, though it may take some time as we are working on issues for GA as well as KubeCon, (but if this is a bug then obviously we'd like to squash it before GA.)

kingdonb avatar Oct 05 '22 13:10 kingdonb

Could you let us know what version of Flux is installed?

I'm looking at this issue and it reminds me of an issue that affected Flux's "new" Server Side Apply when it was still very new, but it's been resolved for some time. It might not be the same issue, but in the interest of ruling it out quickly, what is the output of flux check (it includes the version information)

Currently on v0.24 of the Helm Controller.

it seems like the change perhaps hasn't actually been applied, can you check if there is any error reconciling the HelmRelease or parent Kustomization that might explain why this happened?

Test manifests where applied directly with kubectl so Kustomization Controller was not involved.

As for Helm Controller log it shows nothing relevant, simply that reconciliation finished:

time message
2022-10-06 09:17:23 msg="reconcilation finished in 57.883487ms, next run in 5m0s"
2022-10-06 09:15:30 msg="reconcilation finished in 45.435263ms, next run in 5m0s"
2022-10-06 09:12:11 msg="reconcilation finished in 9.175023225s, next run in 5m0s"
2022-10-06 09:10:43 msg="reconcilation finished in 12.716648708s, next run in 5m0s"
2022-10-06 09:10:30 msg="reconcilation finished in 83.638857ms, next run in 5m0s"

daniel-anova avatar Oct 06 '22 08:10 daniel-anova

I wonder if this is a bug in Flux at all, or if it's even a bug in Helm

We need to see if this behavior reproduces in the Helm CLI if you try a similar thing. I'm wondering if it's somehow part of the three-way merge behavior. You mentioned that you did kubectl apply and aren't using Kustomize controller – just to clarify that could be part of the issue as well. If you did kubectl apply then it's using a three-way merge, and you need to be quite sure the HelmRelease that's represented on the cluster says what you think. It might have merged the values in the array.

If you were using Kustomize controller here, it does not use kubectl apply with three-way merge in quite the same way, Flux will actually wipe out any drift (unless you needed three-way merge, and there's a way to opt into it) but in the kubectl cli for Kubernetes, kubectl apply -f by default is a three-way merge, and you can get strange behavior from it. Broadly speaking, the server-side apply feature in Kustomize controller fixes this.

If you can confirm whether the YAML on the cluster has really been affected, one way to see is to use kubectl edit in between and verify the YAML looks how you expect, another way to mitigate this problem would be to see if it behaves the same when you do kubectl replace instead of kubectl apply.

Can you check on these avenues so we can verify this is a bug before we spent more time to investigate further?

kingdonb avatar Oct 15 '22 14:10 kingdonb

The HelmRelease was correct with both kubectl and Kustomize controller, I just retested the test case to be 100% sure and when removing the duplicate value the HelmRelease shows the remaining env variable as expected, the same behavior is seen when using the controller.

The helm rendered Deployment gets all env variable entries removed, further reconciliation does not add the existing var in the HelmRelease back to the deployment.

I hope that clarifies things.

daniel-anova avatar Oct 17 '22 08:10 daniel-anova

Thanks @daniel-anova – I hope to see that someone can follow this up before too long after KubeCon.

I won't be able to look at it again before then myself

kingdonb avatar Oct 17 '22 23:10 kingdonb