flux2 icon indicating copy to clipboard operation
flux2 copied to clipboard

Breaking changes in Flux due to Kustomize v4

Open stefanprodan opened this issue 3 years ago • 32 comments

Starting with version 0.15.0, Flux and its controllers have been upgraded to Kustomize v4. While Kustomize v4 comes with many improvements and bug fixes, it introduces a couple of breaking changes.

Remote archives

Due to the removal of hashicorp/go-getter from Kustomize v4, the set of URLs accepted by Kustomize in the resources filed is reduced to file system paths, URLs to plain YAMLs and values compatible with git clone.

This means you can no longer use resources from archives (zip, tgz, etc).

No longer works:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- https://github.com/rook/rook/archive/refs/heads/master.zip//rook-master/cluster/examples/kubernetes/ceph/crds.yaml

Works:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- https://raw.githubusercontent.com/rook/rook/v1.6.0/cluster/examples/kubernetes/ceph/crds.yaml

Non-string YAML keys

Due to a bug in Kustomize v4, if you have non-string keys in your manifests, the controller will fail to build the final manifest.

The non-string keys bug affects Helm release like the nginx-ingress one, for example:

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: nginx-ingress
spec:
  values:
    tcp:
      2222: "app/server:2222"

The above will fail with {}{2222:"app/server:2222:2222"}}}}: json: unsupported type: map[interface {}]interface {}.

To fix this issue, you have to make the YAML keys into strings, e.g.:

  values:
    tcp:
      "2222": "app/server:2222"

Duplicate YAML keys

Unlike Helm, the Kustomize yaml parser (kyaml) does not accept duplicate keys, while Helm drops the duplicates, Kustomize errors out. This impacts helm-controller as it uses kustomize/kyaml to label objects reconciled by a HelmRelease.

For example, a chart that adds the app.kubernetes.io/name more than once, will result in a HelmRelease install failure:

map[string]interface {}(nil): yaml: unmarshal errors:
line 21: mapping key "app.kubernetes.io/name" already defined at line 20

YAML formatting

Due to a bug in Kustomize v4 that makes the image-automation-controller crash when YAMLs contain non-ASCII characters, we had to update the underlying go-yaml package to fix the panics.

The gopkg.in/yaml.v3 update means that the indentation style changed:

From:

spec:
  containers:
  - name: one
    image: image1:v1.0.0 # {"$imagepolicy": "automation-ns:policy1"}
  - name: two
    image: image2:v1.0.0 # {"$imagepolicy": "automation-ns:policy2"}

To:

spec:
  containers:
    - name: one
      image: image1:v1.0.0 # {"$imagepolicy": "automation-ns:policy1"}
    - name: two
      image: image2:v1.0.0 # {"$imagepolicy": "automation-ns:policy2"}

stefanprodan avatar Jun 15 '21 12:06 stefanprodan

Due to the removal of hashicorp/go-getter from Kustomize v4, the set of URLs accepted by Kustomize in the resources filed is reduced to only file system paths or values compatible with git clone. This means you can no longer use resources from archives (zip, tgz, etc).

Does this mean standard URLs do not work anymore? e.g.

---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
commonLabels:
  grafana_dashboard: "1"
resources:
- https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/v0.8.0/manifests/grafana-dashboardDefinitions.yaml

onedr0p avatar Jun 15 '21 12:06 onedr0p

@onedr0p I've update the issue with examples, let me know if it answers your question.

stefanprodan avatar Jun 15 '21 13:06 stefanprodan

Due to the removal of hashicorp/go-getter from Kustomize v4...

🤦‍♂️

metasim avatar Jun 15 '21 13:06 metasim

I am pretty shocked how easily the kustomize crowd breaks established standards, given how dogmatic they are about their templating philosophy. On that note, maybe flux should not include such massive breaking changes in minor releases.

From an operational perspective, this is a nightmare. I guess we will stay on flux 0.14.0 for some time until this has settled.

@stefanprodan Thank you for pushing back on this and clearly documenting the impacts. 👍

IsNull avatar Jun 22 '21 08:06 IsNull

maybe flux should not include such massive breaking changes in minor releases.

You may not be aware, but flux2 has no GA release so we can't bump the major version before going GA aka 2.0.0. Every minor release of flux2 could come with breaking changes, we try to communicate those ahead of time, in this case with Kustomize v4, I documented the whole thing months ago here: https://github.com/fluxcd/flux2/issues/918

stefanprodan avatar Jun 22 '21 09:06 stefanprodan

I've faced with "Non-string YAML keys" problem, but in the context of helm itself, more specifically a helm template has integer key. As far as I understand this is because of post-rendering kustomization, so basically helm-controller renders helm templates and then runs kustomization - am I correct?

messiahUA avatar Jun 26 '21 10:06 messiahUA

The helm-controller does run a default Kustomize plugin to be able to trace resources that originate from a HelmRelease by adding labels.

The impact of this may however been underestimated with the recent changes to Kustomize v4, and we may want to provide some sort of configuration flag to disable this default behavior for charts it does not cope with.

hiddeco avatar Jun 26 '21 15:06 hiddeco

I stumbled upon the "Duplicate YAML keys" problem in one of my releases. Fixing it is rather easy.

I'm a little concerned about how to avoid this kind of failure in the future. What is the test that needs to be added to CI so it would break before merge to master/develop

I found a very awkward way to do it, but I wonder if someone found something more sustainable...

or-shachar avatar Aug 08 '21 11:08 or-shachar

@or-shachar I also have the same issue. How did you workaround this? I need to update serviceMonitor key in the the values for the HelmRelease. Orginally I did it this way:

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: promtail
  namespace: log
spec:
  interval: 1h
  chart:
    spec:
      chart: promtail
      sourceRef:
        kind: HelmRepository
        name: grafana
        namespace: flux-system
  values:
    serviceMonitor:
      enabled: true
      labels:
        release: mon

But now I get the "Duplicate YAML keys" error...

akselleirv avatar Sep 01 '21 12:09 akselleirv

I have two initContainers in my deployment and I cannot proceed (I can merge the command, but still, this is one of the first apps I'm porting to flux 2, I don't want to guess the issues I'll find in other ones), is this bug related?

 {"level":"error","ts":"2021-09-09T22:35:34.794Z","logger":"controller.helmrelease","msg":"Reconciler error","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"sendy","namespace":"sendy","error":"Helm upgrade failed: error while running post render on
│  files: map[string]interface {}(nil): yaml: unmarshal errors:\n  line 93: mapping key \"name\" already defined at line 91\n  line 107: mapping key \"name\" already defined at line 105"}
      initContainers:
        - name: create-csvs
          image: "{{ .Values.image }}"
          command:
          - mkdir
          - -p
          - /var/www/html/sendy/uploads/csvs
          volumeMounts:
          - name: data
            mountPath: /var/www/html/sendy/uploads
            name: sendy-data
        - name: take-data-dir-ownership
          image: "{{ .Values.image }}"
          command:
          - chown
          - -R
          - www-data:www-data
          - /var/www/html/sendy/uploads
          volumeMounts:
          - name: data
            mountPath: /var/www/html/sendy/uploads
            name: sendy-data

masterkain avatar Sep 09 '21 22:09 masterkain

@masterkain:

is this bug related?

No. That is an issue with your yaml, you are defining volumeMounts names twice for both initcontainers.

          volumeMounts:
          - name: data #### Here...
            mountPath: /var/www/html/sendy/uploads
            name: sendy-data #### ... and here.

endrec avatar Sep 10 '21 07:09 endrec

thanks @endrec, that definitely slipped under my tired eyes 👍

masterkain avatar Sep 10 '21 07:09 masterkain

Looks like this is fixed in upstream now: kubernetes-sigs/kustomize#3675

jeinwag avatar Sep 23 '21 11:09 jeinwag

Can't wait for the update to this newer version of kustomize in Flux, anchor support is amazing.

onedr0p avatar Sep 23 '21 11:09 onedr0p

And there's alreay a new kustomize release which includes the fix: https://github.com/kubernetes-sigs/kustomize/releases/tag/kustomize%2Fv4.4.0

jeinwag avatar Sep 28 '21 06:09 jeinwag

@stefanprodan Here's the comment as per your request

I noticed this by accident, and I've been lucky for now that it hasn't caused issues yet, but I think it's just a matter of time.

kustomize wraps lines that are longer than 80 chars in the resulting YAML manifests, meaning from the 81st character the line continues in a new line. This still happens with the latest version of kustomize (4.4.0).

There is a PR open on the kustomize repo to fix this, but it's missing something before it's ready to be merged

https://github.com/kubernetes-sigs/kustomize/pull/4222

francesco-beccaria avatar Oct 21 '21 13:10 francesco-beccaria

@stefanprodan if i use this version of kustomize:

➜ kustomize version
{Version:kustomize/v4.4.0 GitCommit:63ec6bdb3d737a7c66901828c5743656c49b60e1 BuildDate:2021-09-27T16:13:36Z GoOs:darwin GoArch:amd64}

I do not get the dupe key error. But with current flux v0.20.1, I am seeing the dupe key error on an ingress-nginx spec.

➜ flux get kustomization ingress-nginx
NAME         	READY	MESSAGE                                                                                                                                                                                                                                                                                                          	REVISION                                     	SUSPENDED
ingress-nginx	False	Deployment/ingress-nginx/ingress-nginx-controller dry-run failed, error: failed to create manager for existing fields: failed to convert new object (apps/v1, Kind=Deployment) to smd typed: .spec.template.spec.containers[name="controller"].ports: duplicate entries for key [containerPort=80,protocol="TCP"]	main/5f7e56ed5328481798d7feff415036e220d32178	False

However, the section of the spec it is complaining about does not have a dupe entry:

          ports:
            - name: http
              containerPort: 80
              protocol: TCP
            - name: https
              containerPort: 443
              protocol: TCP
            - name: tohttps
              containerPort: 2443
              protocol: TCP
            - name: webhook
              containerPort: 8443
              protocol: TCP

It used to. Before it was specified like this:

          ports:
            - name: http
              containerPort: 80
              protocol: TCP
            - name: https
              containerPort:  80
              protocol: TCP
            - name: tohttps
              containerPort: 2443
              protocol: TCP
            - name: webhook
              containerPort: 8443
              protocol: TCP

...but I pushed a commit to change the port for https from 80 to 443, then did flux get kustomization ingress-nginx --with-source, but it is still complaining about the same dupe key entry. Could it be cached?

We're getting umpteen million alerts in a slack chan on this, so I'm trying to make it go away.

davisford avatar Nov 04 '21 16:11 davisford

Hi @stefanprodan, We are affected with the duplicate keys issue. We are actually using Helm charts from another company we work with. Would there be a way maybe to to ignore those errors?

jerem0808 avatar Nov 11 '21 07:11 jerem0808

Would there be a way maybe to to ignore those errors?

No there is no workaround, I think soon Helm itself will error out the same as Flux once they update their yaml processor package.

stefanprodan avatar Nov 11 '21 07:11 stefanprodan

I stumbled upon the "Duplicate YAML keys" problem in one of my releases. Fixing it is rather easy.

I'm a little concerned about how to avoid this kind of failure in the future. What is the test that needs to be added to CI so it would break before merge to master/develop

I found a very awkward way to do it, but I wonder if someone found something more sustainable...

and what is your way?

marianobilli avatar Dec 08 '21 10:12 marianobilli

I really need a workaround for charts that have lots of duplicate keys in them, like GitLab.

@or-shachar can you share your method please? @marianobilli have you found a workaround?

I guess one option might be to do a Kustomize PostRenderer to patch all of the effected yamls, but that would take AGES.

EDIT: Tried fixing with Kustomize PostRenderer and it doesn't seem to work, it errors out before even attempting the patch.

JVMartin avatar Dec 21 '21 23:12 JVMartin

I really need a workaround for charts that have lots of duplicate keys in them, like GitLab.

@or-shachar can you share your method please? @marianobilli have you found a workaround?

I guess one option might be to do a Kustomize PostRenderer to patch all of the effected yamls, but that would take AGES.

EDIT: Tried fixing with Kustomize PostRenderer and it doesn't seem to work, it errors out before even attempting the patch.

I had to fix the helm template with the duplicate keys.

marianobilli avatar Dec 22 '21 00:12 marianobilli

I just upgraded to v0.29.0 and noticed that a kustomization like this is no longer supported, is this safe to assume now?

---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - github.com/rancher/system-upgrade-controller?ref=v0.9.1

I looked into it a bit more and discvoered this PR https://github.com/kubernetes-sigs/kustomize/pull/4453 which makes it seem like the below would work

---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - git::https://github.com/rancher/system-upgrade-controller?ref=v0.9.1

But I am getting an error:

kustomize build failed: accumulating resources: accumulation err='accumulating resources from 'system-upgrade': read /tmp/apps2651878754/cluster/apps/system-upgrade: is a directory': recursed accumulation of path '/tmp/apps2651878754/cluster/apps/system-upgrade': accumulating resources: accumulation err='accumulating resources from 'system-upgrade-controller': read /tmp/apps2651878754/cluster/apps/system-upgrade/system-upgrade-controller: is a directory': recursed accumulation of path '/tmp/apps2651878754/cluster/apps/system-upgrade/system-upgrade-controller': accumulating resources: accumulation err='accumulating resources from 'git::https://github.com/rancher/system-upgrade-controller?ref=v0.9.1': open /tmp/apps2651878754/cluster/apps/system-upgrade/system-upgrade-controller/git::https:/github.com/rancher/system-upgrade-controller?ref=v0.9.1: no such file or directory': fs-security-constraint abs /tmp/kustomize-356128291: path '/tmp/kustomize-356128291' is not in or below '/tmp/apps2651878754'

onedr0p avatar Apr 20 '22 11:04 onedr0p

@onedr0p this is a newly introduced security constraint set too tight. I will get this sorted now, and ensure a regression test is added.

hiddeco avatar Apr 20 '22 12:04 hiddeco

Have a confirmed fix, but need to do the required writing of more extensive tests. Will aim to have it available before EOD UTC.

hiddeco avatar Apr 20 '22 13:04 hiddeco

A computer is currently doing the required work to produce a ghcr.io/fluxcd/kustomize-controller:v0.24.1 image. Once available, I will patch the Flux CLI as soon as CI allows me to. When impatient, manually patching the kustomize-controller version in your Git repository would be a workaround.

hiddeco avatar Apr 20 '22 15:04 hiddeco

Users running into issues after updating to v0.29.0, should see smooth operation again with v0.29.1. Sorry about any inconvenience it may have caused, Terraform provider release will follow shortly.

hiddeco avatar Apr 20 '22 16:04 hiddeco

Hi, day 1 as a new user, I'm wondering if this report belongs here. I am also seeing a duplicate key.

❯ flux get kustomizations --watch
NAME       	REVISION                                     	SUSPENDED	READY	MESSAGE                                                                                                                                                                                                   
flux-system	main/bb9797709c06bde527638850c0d29e91475bb057	False    	False	Node/k0 dry-run failed, error: failed to create manager for existing fields: failed to convert new object (/v1, Kind=Node) to smd typed: .status.addresses: duplicate entries for key [type="InternalIP"]

I do indeed have multiple InternalIP addresses.

❯ k get node k0 -o json | jq '.status.addresses'
[
  {
    "address": "172.16.15.20",
    "type": "InternalIP"
  },
  {
    "address": "fc15::20",
    "type": "InternalIP"
  },
  {
    "address": "k0",
    "type": "Hostname"
  }
]

zachfi avatar May 06 '22 17:05 zachfi

I'm getting similar error with flux version 0.30.2. It used to work with 0.28.0.

---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - gotk-components.yaml
  - gotk-sync.yaml
  - ../../base
patchesStrategicMerge:
  - ../../gotk-patches.yaml
✗ accumulating resources: accumulation err='accumulating resources from '../../base': fs-security-constraint read /tmp/flux-bootstrap-3660114182/clusters/base: path '/tmp/flux-bootstrap-3660114182/clusters/base' is not in or below '/tmp/flux-bootstrap-3660114182/clusters/staging'': fs-security-constraint abs /tmp/flux-bootstrap-3660114182/clusters/base: path '/tmp/flux-bootstrap-3660114182/clusters/base' is not in or below '/tmp/flux-bootstrap-3660114182/clusters/staging'

tomaszduda23 avatar May 14 '22 19:05 tomaszduda23

@tomaszduda23 can you confirm the directory structure in https://github.com/fluxcd/kustomize-controller/pull/657 matches yours? As the tests for this appear to pass.

hiddeco avatar May 23 '22 09:05 hiddeco