Kustomization trying to apply wrong version of terraform object
I am repeatedly hitting issues with a kustomization that is applying a terraform object. The terraform object is version v1alpha2 but Kustomization intermittently tries to apply it as v1alpha1 causing an error and removing fields that are new in v1alpha2
flux version
flux: v2.0.0-rc.5
helm-controller: v0.34.1
kustomize-controller: v1.0.0-rc.4
notification-controller: v1.0.0-rc.4
source-controller: v1.0.0-rc.5
See https://github.com/ww-gitops/paulcarlton-ww-macbook/blob/main/infra/dynamic/eks-config/eks-config.yaml#L1 for terraform yaml specifying v1alpha2
The kustomization is shown below, this is generated by GitOpsSets
paulc:paulcarlton-ww-macbook paulc [sandbox]$ kubectl get kustomizations.kustomize.toolkit.fluxcd.io -n test-two test-eks-config-c1 -o yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
annotations:
reconcile.fluxcd.io/requestedAt: "2023-06-08T12:05:09.935189+01:00"
templates.weave.works/create-request: ""
creationTimestamp: "2023-06-07T09:05:43Z"
finalizers:
- finalizers.fluxcd.io
generation: 1
labels:
templates.weave.works/name: test-eks-config
templates.weave.works/namespace: test-two
name: test-eks-config-c1
namespace: test-two
ownerReferences:
- apiVersion: templates.weave.works/v1alpha1
kind: GitOpsSet
name: test-eks-config
uid: 83b72996-e275-40c7-be94-73ce81ca27e7
resourceVersion: "394234"
uid: 0b905535-275c-4f0c-8f93-7694132ac391
spec:
dependsOn:
- name: test-eks-c1
force: false
interval: 1m0s
path: ./infra/dynamic/eks-config
postBuild:
substitute:
clusterName: c1
desired_size: '"2"'
resourceName: test
target_path: clusters/test-two/test/
templateNamespace: test-two
substituteFrom:
- kind: ConfigMap
name: cluster-config
optional: false
prune: true
sourceRef:
kind: GitRepository
name: flux-system
namespace: flux-system
timeout: 5m
wait: true
status:
conditions:
- lastTransitionTime: "2023-06-08T11:05:10Z"
message: Detecting drift for revision main@sha1:1f3037dfe7ef90888acae6b5d6dce8b690d5dac4
with a timeout of 5m0s
observedGeneration: 1
reason: ProgressingWithRetry
status: "True"
type: Reconciling
- lastTransitionTime: "2023-06-08T11:05:10Z"
message: |
Terraform/test-two/eks-config-test-c1 dry-run failed, error: failed to prune fields: failed add back owned items: failed to convert merged object at version infra.contrib.fluxcd.io/v1alpha1: .spec.runnerPodTemplate.spec.hostAliases: field not declared in schema
observedGeneration: 1
reason: ReconciliationFailed
status: "False"
type: Ready
- lastTransitionTime: "2023-06-08T11:04:38Z"
message: 'timeout waiting for: [Terraform/test-two/eks-config-test-c1 status:
''InProgress'']'
observedGeneration: 1
reason: HealthCheckFailed
status: "False"
type: Healthy
inventory:
entries:
- id: test-two_eks-config-test-c1_infra.contrib.fluxcd.io_Terraform
v: v1alpha2
lastAppliedRevision: main@sha1:55d8d6ac2e5dfba25fc177e904f1ee0bb1db4cdb
lastAttemptedRevision: main@sha1:1f3037dfe7ef90888acae6b5d6dce8b690d5dac4
lastHandledReconcileAt: "2023-06-08T12:05:09.935189+01:00"
observedGeneration: 1
What's the CRD on the cluster look like?
and do you still get this error with kubectl apply -f --dry-run=server <yaml file>
I suspect tf-controller does a status update using v1alpha1 which reverts the version in etcd, then the API returns that error. Flux applies v1alpha2 according to its inventory record. I guess you’re running tf-controller at an older version which only knows about v1alpha1.
I am definitely running latest version of the controller image: ghcr.io/weaveworks/tf-controller:v0.15.0-rc.4
I have restarted kustomization controller, even blown the Kubernetes cluster away and recreated (it is running Docker Kubernetes on my MacBook).
What's the CRD on the cluster look like? and do you still get this error with
kubectl apply -f --dry-run=server <yaml file>
I haven't tried that, I assume it will work, I can work around this by suspending the kustomization, deleting the tf custom resource and then resuming the kustomization
I assume it will work
Assumptions do not provide us with useful information to further help you with your case.
I assume it will work
Assumptions do not provide us with useful information to further help you with your case.
ok, I've captured the object yaml will reapply it using kubectl next time it happens. Note that it always works using kustomization the first time (i.e. when the tf custom resource does not exist) and often when reapplying
kubectl apply works fine
Well the command @somtochiama posted overrides the resources, that’s not what Flux does. We use server-side apply dry-run.
I just did kubectl apply -f ...
I can add --dry-run=server next time I have to delete and re add it
The command is kubectl apply --server-side --field-manager=kustomize-controller --dry-run=server without the field manager it will fail, without —server-side it overrides. cc @somtochiama
Anyway you may want to report this in the tf-controlller repo, IMO there is nothing to do in Flux, we don’t change the API version at sequential applies, if that were the case, all things would fail since we bumped Flux APIs many times before.
Will do @chanwit fyi
I tried adding --dry-run=server to kubectl apply, it generated correct yaml
Raised https://github.com/weaveworks/tf-controller/issues/656