kustomize-controller icon indicating copy to clipboard operation
kustomize-controller copied to clipboard

Kustomization trying to apply wrong version of terraform object

Open nab-gha opened this issue 2 years ago • 16 comments

I am repeatedly hitting issues with a kustomization that is applying a terraform object. The terraform object is version v1alpha2 but Kustomization intermittently tries to apply it as v1alpha1 causing an error and removing fields that are new in v1alpha2

flux version
flux: v2.0.0-rc.5
helm-controller: v0.34.1
kustomize-controller: v1.0.0-rc.4
notification-controller: v1.0.0-rc.4
source-controller: v1.0.0-rc.5

See https://github.com/ww-gitops/paulcarlton-ww-macbook/blob/main/infra/dynamic/eks-config/eks-config.yaml#L1 for terraform yaml specifying v1alpha2

The kustomization is shown below, this is generated by GitOpsSets

paulc:paulcarlton-ww-macbook paulc [sandbox]$ kubectl get kustomizations.kustomize.toolkit.fluxcd.io -n test-two      test-eks-config-c1      -o yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  annotations:
    reconcile.fluxcd.io/requestedAt: "2023-06-08T12:05:09.935189+01:00"
    templates.weave.works/create-request: ""
  creationTimestamp: "2023-06-07T09:05:43Z"
  finalizers:
  - finalizers.fluxcd.io
  generation: 1
  labels:
    templates.weave.works/name: test-eks-config
    templates.weave.works/namespace: test-two
  name: test-eks-config-c1
  namespace: test-two
  ownerReferences:
  - apiVersion: templates.weave.works/v1alpha1
    kind: GitOpsSet
    name: test-eks-config
    uid: 83b72996-e275-40c7-be94-73ce81ca27e7
  resourceVersion: "394234"
  uid: 0b905535-275c-4f0c-8f93-7694132ac391
spec:
  dependsOn:
  - name: test-eks-c1
  force: false
  interval: 1m0s
  path: ./infra/dynamic/eks-config
  postBuild:
    substitute:
      clusterName: c1
      desired_size: '"2"'
      resourceName: test
      target_path: clusters/test-two/test/
      templateNamespace: test-two
    substituteFrom:
    - kind: ConfigMap
      name: cluster-config
      optional: false
  prune: true
  sourceRef:
    kind: GitRepository
    name: flux-system
    namespace: flux-system
  timeout: 5m
  wait: true
status:
  conditions:
  - lastTransitionTime: "2023-06-08T11:05:10Z"
    message: Detecting drift for revision main@sha1:1f3037dfe7ef90888acae6b5d6dce8b690d5dac4
      with a timeout of 5m0s
    observedGeneration: 1
    reason: ProgressingWithRetry
    status: "True"
    type: Reconciling
  - lastTransitionTime: "2023-06-08T11:05:10Z"
    message: |
      Terraform/test-two/eks-config-test-c1 dry-run failed, error: failed to prune fields: failed add back owned items: failed to convert merged object at version infra.contrib.fluxcd.io/v1alpha1: .spec.runnerPodTemplate.spec.hostAliases: field not declared in schema
    observedGeneration: 1
    reason: ReconciliationFailed
    status: "False"
    type: Ready
  - lastTransitionTime: "2023-06-08T11:04:38Z"
    message: 'timeout waiting for: [Terraform/test-two/eks-config-test-c1 status:
      ''InProgress'']'
    observedGeneration: 1
    reason: HealthCheckFailed
    status: "False"
    type: Healthy
  inventory:
    entries:
    - id: test-two_eks-config-test-c1_infra.contrib.fluxcd.io_Terraform
      v: v1alpha2
  lastAppliedRevision: main@sha1:55d8d6ac2e5dfba25fc177e904f1ee0bb1db4cdb
  lastAttemptedRevision: main@sha1:1f3037dfe7ef90888acae6b5d6dce8b690d5dac4
  lastHandledReconcileAt: "2023-06-08T12:05:09.935189+01:00"
  observedGeneration: 1

nab-gha avatar Jun 08 '23 13:06 nab-gha

What's the CRD on the cluster look like? and do you still get this error with kubectl apply -f --dry-run=server <yaml file>

somtochiama avatar Jun 08 '23 13:06 somtochiama

I suspect tf-controller does a status update using v1alpha1 which reverts the version in etcd, then the API returns that error. Flux applies v1alpha2 according to its inventory record. I guess you’re running tf-controller at an older version which only knows about v1alpha1.

stefanprodan avatar Jun 08 '23 14:06 stefanprodan

I am definitely running latest version of the controller image: ghcr.io/weaveworks/tf-controller:v0.15.0-rc.4

nab-gha avatar Jun 08 '23 14:06 nab-gha

tf-crd.txt

nab-gha avatar Jun 08 '23 14:06 nab-gha

I have restarted kustomization controller, even blown the Kubernetes cluster away and recreated (it is running Docker Kubernetes on my MacBook).

nab-gha avatar Jun 08 '23 14:06 nab-gha

What's the CRD on the cluster look like? and do you still get this error with kubectl apply -f --dry-run=server <yaml file>

I haven't tried that, I assume it will work, I can work around this by suspending the kustomization, deleting the tf custom resource and then resuming the kustomization

nab-gha avatar Jun 08 '23 14:06 nab-gha

I assume it will work

Assumptions do not provide us with useful information to further help you with your case.

hiddeco avatar Jun 08 '23 15:06 hiddeco

I assume it will work

Assumptions do not provide us with useful information to further help you with your case.

ok, I've captured the object yaml will reapply it using kubectl next time it happens. Note that it always works using kustomization the first time (i.e. when the tf custom resource does not exist) and often when reapplying

nab-gha avatar Jun 08 '23 16:06 nab-gha

kubectl apply works fine

nab-gha avatar Jun 08 '23 16:06 nab-gha

Well the command @somtochiama posted overrides the resources, that’s not what Flux does. We use server-side apply dry-run.

stefanprodan avatar Jun 08 '23 16:06 stefanprodan

I just did kubectl apply -f ...

I can add --dry-run=server next time I have to delete and re add it

nab-gha avatar Jun 08 '23 16:06 nab-gha

The command is kubectl apply --server-side --field-manager=kustomize-controller --dry-run=server without the field manager it will fail, without —server-side it overrides. cc @somtochiama

stefanprodan avatar Jun 08 '23 17:06 stefanprodan

Anyway you may want to report this in the tf-controlller repo, IMO there is nothing to do in Flux, we don’t change the API version at sequential applies, if that were the case, all things would fail since we bumped Flux APIs many times before.

stefanprodan avatar Jun 08 '23 17:06 stefanprodan

Will do @chanwit fyi

nab-gha avatar Jun 08 '23 17:06 nab-gha

I tried adding --dry-run=server to kubectl apply, it generated correct yaml

nab-gha avatar Jun 08 '23 18:06 nab-gha

Raised https://github.com/weaveworks/tf-controller/issues/656

nab-gha avatar Jun 08 '23 18:06 nab-gha