flux2
flux2 copied to clipboard
[flux v0.26.0-2] Kustomization tries to modify immutable fields
Describe the bug
I updated to flux 0.26.1 and then observed an reconciliation error in a deployment. I deleted the deployment and now there is a problem with the pvc of that deployment.
PersistentVolumeClaim/monitoring/influxdb-volume dry-run failed, reason: Invalid, error: PersistentVolumeClaim "influxdb-volume" is invalid: spec: Forbidden: spec is immutable after creation except resources.requests for bound claims
core.PersistentVolumeClaimSpec{
AccessModes: {"ReadWriteOnce"},
Selector: nil,
Resources: {Requests: {s"storage": {i: {...}, s: "10Gi", Format: "BinarySI"}}},
- VolumeName: "",
+ VolumeName: "pvc-c7f9929e-2741-43ce-b690-ed00816092ad",
StorageClassName: &"aws-gp2-dynamic",
VolumeMode: &"Filesystem",
DataSource: nil,
}
I tried to downgrade the kustomization controller, but that did not resolve the issue.
Steps to reproduce
- Install flux 0.25.3
- create a PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: influxdb-volume
labels:
kustomize.toolkit.fluxcd.io/prune: disabled
spec:
storageClassName: aws-gp2-dynamic
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 10Gi
- Update flux to 0.26.1
Expected behavior
Should work after the update
Screenshots and recordings
No response
OS / Distro
Ubuntu 21
Flux version
flux: v0.26.1
Flux check
► checking prerequisites ✔ Kubernetes 1.20.11-eks-f17b81 >=1.20.6-0 ► checking controllers ✔ helm-controller: deployment ready ► ghcr.io/fluxcd/helm-controller:v0.16.0 ✔ image-automation-controller: deployment ready ► ghcr.io/fluxcd/image-automation-controller:v0.20.0 ✔ image-reflector-controller: deployment ready ► ghcr.io/fluxcd/image-reflector-controller:v0.16.0 ✔ kustomize-controller: deployment ready ► ghcr.io/fluxcd/kustomize-controller:v0.19.1 ✔ notification-controller: deployment ready ► ghcr.io/fluxcd/notification-controller:v0.21.0 ✔ source-controller: deployment ready ► ghcr.io/fluxcd/source-controller:v0.21.1 ✔ all checks passed
Git provider
Gitlab
Container Registry provider
Gitlab
Additional context
No response
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
I'm going to try to reproduce this, but first could you please add some more information about how you created the PVC?
Is it just, create a PVC (without volumeName) through Flux 0.25.3, then upgrade to a different version?
(I did not run into this issue in my testing, possibly because I am always creating PVCs with volumeName
set)
I updated the issue, I use PVC with dynamic provisioning of storage classes on aws. As a workaround I removed the pvc from the kustomization.
Can you please post the content output from:
kubectl get pvc influxdb-volume -oyaml --show-managed-fields
Also, there are some serious formatting errors in the YAML that you posted, and it's missing a namespace. This would not work in Flux. Can you put the YAML that was added in the commit to Flux?
I tried to replicate your scenario here:
https://github.com/kingdonb/fleet-testing/blob/main/apps/keycloak/example-pvc.yaml
Installed Flux 0.25.3, added a PVC and a pod bound to it so the PV would be created... omitted volumeName from my spec, as it appears you have (and most people would do)
Upgraded to Flux 0.26.1
I am not seeing any errors across the Kubernetes versions in my test matrix. If the issue is resolved for you, I cannot reproduce it and we'll have to close this, unless you can provide more information (or if somebody else has this issue.)
Thanks again for your report.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs
volume.kubernetes.io/selected-node: ip-192-168-41-199.eu-central-1.compute.internal
volume.kubernetes.io/storage-resizer: kubernetes.io/aws-ebs
creationTimestamp: "2020-10-02T16:43:43Z"
finalizers:
- kubernetes.io/pvc-protection
labels:
kustomize.toolkit.fluxcd.io/name: infrastructure
kustomize.toolkit.fluxcd.io/namespace: flux-system
kustomize.toolkit.fluxcd.io/prune: disabled
managedFields:
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:pv.kubernetes.io/bind-completed: {}
f:pv.kubernetes.io/bound-by-controller: {}
f:volume.beta.kubernetes.io/storage-provisioner: {}
f:volume.kubernetes.io/selected-node: {}
f:volume.kubernetes.io/storage-resizer: {}
f:finalizers:
.: {}
v:"kubernetes.io/pvc-protection": {}
f:labels:
.: {}
f:kustomize.toolkit.fluxcd.io/namespace: {}
f:kustomize.toolkit.fluxcd.io/prune: {}
f:spec:
f:accessModes: {}
f:resources:
f:requests:
.: {}
f:storage: {}
f:storageClassName: {}
f:volumeMode: {}
f:volumeName: {}
f:status:
f:accessModes: {}
f:capacity:
.: {}
f:storage: {}
f:phase: {}
manager: kustomize-controller
operation: Apply
time: "2021-10-27T11:39:29Z"
name: influxdb-volume
namespace: monitoring
resourceVersion: "456920544"
uid: c7f9929e-2741-43ce-b690-ed00816092ad
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: aws-gp2-dynamic
volumeMode: Filesystem
volumeName: pvc-c7f9929e-2741-43ce-b690-ed00816092ad
status:
accessModes:
- ReadWriteOnce
capacity:
storage: 10Gi
phase: Bound
I have a kustomization.yaml file which defines the namespace for all resources
f:volumeName: {}
This suggests that the Flux YAML in your git repo contains the volumeName
field in its spec. Is that the case?
no
maybe does the kustomization add this field somehow?
It is possible the Flux manager took the field over when it shouldn't have, I'm not sure.
In my cluster, I have this:
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
f:pv.kubernetes.io/bind-completed: {}
f:pv.kubernetes.io/bound-by-controller: {}
f:volume.beta.kubernetes.io/storage-provisioner: {}
f:volume.kubernetes.io/storage-provisioner: {}
f:spec:
f:volumeName: {}
manager: kube-controller-manager
operation: Update
time: "2022-02-04T15:35:52Z"
If flux finds fields managed by kube-controller-manager
, it does not manage or take them over as I understand it.
My cluster is kind + local-path-provisioner, I can try a different storage class provider as this might not be representative.
(The best bet is probably for me to try replicating this on EKS next, since that's your environment...)
I have kubernetes 1.20 and the volume is from 2020, back then it did not use server side apply
I have a test suite that runs on Flux 0.17.2 and upgrades it to the current version, I can use that to try to replicate the issue. If your volumes are that old, it might have different behavior – we resolved a number of issues like that to bring out 0.26.0, where you can only see them if you have started with Flux before serverside apply, and upgraded through 0.18.x-0.25.x.
Like this issue:
- https://github.com/fluxcd/kustomize-controller/issues/486
I might actually need to start with a Kubernetes version before serverside apply to reliably reproduce all of these kind of issues. As if kube-controller-manager uses serverside apply, or if it gets captured in managedFields however that happens nowadays, it won't matter what version of Flux created the PVC initially in the cluster, I won't see the issue reproduced...
I confirmed that Kubernetes 1.20 with Flux 0.17.2 together with local-path-provisioner produces a PVC that looks like this:
kind: PersistentVolume
metadata:
annotations:
pv.kubernetes.io/provisioned-by: rancher.io/local-path
creationTimestamp: "2022-02-04T17:52:28Z"
finalizers:
- kubernetes.io/pv-protection
managedFields:
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:status:
f:phase: {}
manager: kube-controller-manager
operation: Update
time: "2022-02-04T17:52:28Z"
In other words, K8s 1.20.x is not old enough to satisfy the required test conditions. I'm looking into trying to create a K8s cluster @1.15.x, install Flux 0.17.x on it, ... but that will not work, since Flux hasn't supported K8s versions < 1.16.0 since Flux2 v0.0.7
This is an extreme case :)
I don't think we should balk at it though... I think we'll need to upgrade a cluster from K8s 1.15.x, to know for sure what happens when the cluster has resources that were initially created before server side apply had even reached beta.
SSA was marked beta in 1.16.x. It might be easier and just as effective to test against a cluster with 1.16.x, and just ensure that beta SSA feature is turned off before the volume is created. (That way, I don't also need to start with Flux v1...)
I've spent too much time on this today, but I think you have likely got something here. We may need to give some advisory if your cluster was in production before a certain threshold date. Hopefully there's a way we can reconcile this more broadly.
kind: PersistentVolume
metadata:
annotations:
pv.kubernetes.io/provisioned-by: rancher.io/local-path
creationTimestamp: "2022-02-04T17:52:28Z"
finalizers:
- kubernetes.io/pv-protection
managedFields:
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:status:
f:phase: {}
manager: kube-controller-manager
operation: Update
time: "2022-02-04T17:52:28Z"
@kingdonb this is the pv not the pvc
Whoops, you're right... I already tore down the test scaffold, I'll have to repeat the test again later. Sorry for the noise.
@Legion2 is it possible for you to paste the managedFields of the PVC before you updated flux,
No I don't have this data
Removing f:volumeName
from the managedFields with a kubectl patch should unblock you. But we need to figure out how did kustomize-controller ended up owning that field. Was this PVC created with Flux v1 or with kubectl?
We used flux from the beginning and used it for everything in the cluster, so I think it was created with flux 1. However, since then we migrated stuff multiple times and need to manually fix stuff, so I'm not 100% sure if kubectl is not involved here.
@stefanprodan I'm not familiar with the kubectl patch command, could you give an example on how to remove a managed field?
Create a YAML with only metadata and managed fields, remove the volumeName, then apply it with kubectl patch, docs here https://kubernetes.io/docs/tasks/manage-kubernetes-objects/update-api-object-kubectl-patch/
We had the same problem with a lot of kustomizations and immutable CluserIP, volumname and stroageclass fields, updating from kustomize controller 0.14.1 to 0.20.2.
flux-system backstage False Service/backstage/patroni-metrics dry-run failed, reason: Invalid, error: Service "patroni-metrics" is invalid: [spec.clusterIPs[0]: Invalid value: []string(nil): primary clusterIP can not be unset, spec.ipFamilies[0]: Invalid value: []core.IPFamily(nil): primary ipFamily can not be unset]...
Removing the the managed fields via patch helps though.
kubectl -n namespace patch svc service-name --type=json -p='[{"op": "remove", "path": "/metadata/managedFields/0/fieldsV1/f:spec/f:clusterIP"}]'
kubectl -n namespace patch pvc pvc-name --type=json -p='[{"op": "remove", "path": "/metadata/managedFields/0/fieldsV1/f:spec/f:volumeName"}]'
kubectl -n namespace patch storageclass storageclass-name --type=json -p='[{"op": "remove", "path": "/metadata/managedFields/0/fieldsV1/f:spec/f:storageClassName"}]'
I was facing the same issue.
Using the following commands was helpful for me:
kubectl -n NAMESPACE patch pvc PVC_NAME --type=json -p='[{"op": "remove", "path": "/metadata/managedFields/0/fieldsV1/f:spec/f:storageClassName"}]'
kubectl -n NAMESPACE patch pvc PVC_NAME --type=json -p='[{"op": "remove", "path": "/metadata/managedFields/0/fieldsV1/f:spec/f:volumeName"}]'
I've got the same problem migration from flux v1 to flux v2 (0.26.1) and with ingress-nginx
.
Flux complains with:
❯ flux reconcile kustomization infra --with-source
► annotating GitRepository flux-system in flux-system namespace
✔ GitRepository annotated
◎ waiting for GitRepository reconciliation
✔ fetched revision aks1-westeurope/4bcf2a7eb50d2f869d217f850fa59955754d6375
► annotating Kustomization infra in flux-system namespace
✔ Kustomization annotated
◎ waiting for Kustomization reconciliation
✗ Kustomization reconciliation failed: Service/ingress-nginx/ingress-nginx dry-run failed, reason: Invalid, error: Service "ingress-nginx" is invalid: spec.clusterIPs[0]: Invalid value: []string(nil): primary clusterIP can not be unset
Here's what's in k8s:
apiVersion: v1
kind: Service
metadata:
annotations:
external-dns.alpha.kubernetes.io/hostname: [snip].
creationTimestamp: "2020-10-22T16:20:39Z"
finalizers:
- service.kubernetes.io/load-balancer-cleanup
labels:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: ingress-nginx
kustomize.toolkit.fluxcd.io/name: infra
kustomize.toolkit.fluxcd.io/namespace: flux-system
name: ingress-nginx
namespace: ingress-nginx
resourceVersion: "192626589"
uid: a79bc74c-aa57-4223-9f80-0b25249f13b9
spec:
clusterIP: 10.113.45.120
clusterIPs:
- 10.113.45.120
externalTrafficPolicy: Local
healthCheckNodePort: 32210
ports:
- name: http
nodePort: 32695
port: 80
protocol: TCP
targetPort: http
- name: https
nodePort: 30120
port: 443
protocol: TCP
targetPort: https
selector:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: ingress-nginx
sessionAffinity: None
type: LoadBalancer
status:
loadBalancer:
ingress:
- ip: [snip]
And here's what we always had in git:
kind: Service
apiVersion: v1
metadata:
name: ingress-nginx
namespace: ingress-nginx
annotations:
external-dns.alpha.kubernetes.io/hostname: ingress.aks1.westeurope.azure.cratedb.net.
labels:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: ingress-nginx
spec:
type: LoadBalancer
externalTrafficPolicy: Local
ports:
- name: http
port: 80
targetPort: http
- name: https
port: 443
targetPort: https
selector:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: ingress-nginx
Banging my head on the table with this. Any help much appreciated.
If you want to see the actual problematic fields you have to add "--show-managed-fields" to your kubectl command.
See my post above for a workaround: https://github.com/fluxcd/flux2/issues/2386#issuecomment-1040102499
kubectl -n ingress-nginx patch svc ingress-nginx --type=json -p='[{"op": "remove", "path": "/metadata/managedFields/0/fieldsV1/f:spec/f:clusterIP"}]'
Thank you! I did not know that managed fields are no longer shown by default, assumed it was a different issue. Problem solved.
If you have not tried upgrading to Flux ~v0.26.2~ v0.26.3 or greater, it has breaking changes around this area:
- https://github.com/fluxcd/kustomize-controller/pull/562
The issue it fixes is here:
- https://github.com/fluxcd/kustomize-controller/issues/558
This could easily be the same issue, and I think it most likely is. I see reports mentioning version v0.26.1 but I do not see anyone who has mentioned this issue on a version >= ~v0.26.2~ v0.26.3, which included the fix above (and another important one, @stefanprodan mentions below.)
As Flux takes over all the managed fields, so if you have edits which you expect to remain in the cluster but they are not in git, they will have to set a manager to avoid being overwritten by Flux. So I want to be careful about advising this upgrade, though it is important and it makes Flux work more like as advertised, (so I do not want to caution anyone away from it.)
https://fluxcd.io/docs/faq/#why-are-kubectl-edits-rolled-back-by-flux
If we can confirm this issue is still present in later versions of Flux, I will be glad to investigate. The kubectl patch described above should no longer be necessary after the upgrade. If anyone is still struggling with this, please let us know. 🙏
I think those that upgraded to v0.26.0 or v0.26.1 or v0.26.2 will have this issue. In v0.26.3 we found a better way to take over the fields managed by kubectl. We'll need to point people that upgraded to v0.26.0-2 to this issue, as patching the managedFields
to remove the immutable ones is the only way to fix this.
Same Problem here with flux v0.24.1
and Kubernetes v1.21.5
. I have a PersistentVolumeClaim
, which was created from a HelmRelease
by FluxCD v0.24.1
on that very cluster some hours before and that looks like this:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data-loki-compactor
namespace: loki-system
uid: 87edda4f-3878-4c14-becc-4a5d5eb398ae
resourceVersion: "122520272"
creationTimestamp: "2022-02-25T11:38:06Z"
labels:
app.kubernetes.io/component: compactor
app.kubernetes.io/instance: loki-system-loki-distributed
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: loki-distributed
app.kubernetes.io/version: 2.4.2
helm.sh/chart: loki-distributed-0.44.0
helm.toolkit.fluxcd.io/name: loki-distributed
helm.toolkit.fluxcd.io/namespace: kube-system
annotations:
meta.helm.sh/release-name: loki-system-loki-distributed
meta.helm.sh/release-namespace: loki-system
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
volume.beta.kubernetes.io/storage-provisioner: ebs.csi.aws.com
volume.kubernetes.io/selected-node: ip-10-176-39-133.eu-central-1.compute.internal
finalizers:
- kubernetes.io/pvc-protection
managedFields:
- manager: helm-controller
operation: Update
apiVersion: v1
time: "2022-02-25T11:38:06Z"
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:meta.helm.sh/release-name: {}
f:meta.helm.sh/release-namespace: {}
f:labels:
.: {}
f:app.kubernetes.io/component: {}
f:app.kubernetes.io/instance: {}
f:app.kubernetes.io/managed-by: {}
f:app.kubernetes.io/name: {}
f:app.kubernetes.io/version: {}
f:helm.sh/chart: {}
f:helm.toolkit.fluxcd.io/name: {}
f:helm.toolkit.fluxcd.io/namespace: {}
f:spec:
f:accessModes: {}
f:resources:
f:requests:
.: {}
f:storage: {}
f:storageClassName: {}
f:volumeMode: {}
- manager: kube-scheduler
operation: Update
apiVersion: v1
time: "2022-02-25T11:38:07Z"
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
f:volume.kubernetes.io/selected-node: {}
- manager: kube-controller-manager
operation: Update
apiVersion: v1
time: "2022-02-25T11:38:10Z"
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
f:pv.kubernetes.io/bind-completed: {}
f:pv.kubernetes.io/bound-by-controller: {}
f:volume.beta.kubernetes.io/storage-provisioner: {}
f:spec:
f:volumeName: {}
f:status:
f:accessModes: {}
f:capacity:
.: {}
f:storage: {}
f:phase: {}
selfLink: /api/v1/namespaces/loki-system/persistentvolumeclaims/data-loki-compactor
status:
phase: Bound
accessModes:
- ReadWriteOnce
capacity:
storage: 10Gi
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
volumeName: pvc-87edda4f-3878-4c14-becc-4a5d5eb398ae
storageClassName: gp
volumeMode: Filesystem
And now I get the following error upon reconcile of the corresponding HelmRelease:
❯ flux reconcile hr -n kube-system loki-distributed
► annotating HelmRelease loki-distributed in kube-system namespace
✔ HelmRelease annotated
◎ waiting for HelmRelease reconciliation
✗ HelmRelease reconciliation failed: Helm upgrade failed: failed to replace object: PersistentVolumeClaim "data-loki-compactor" is invalid: spec: Forbidden: spec is immutable after creation except resources.requests for bound claims
core.PersistentVolumeClaimSpec{
AccessModes: {"ReadWriteOnce"},
Selector: nil,
Resources: {Requests: {s"storage": {i: {...}, s: "10Gi", Format: "BinarySI"}}},
- VolumeName: "",
+ VolumeName: "pvc-87edda4f-3878-4c14-becc-4a5d5eb398ae",
StorageClassName: &"gp",
VolumeMode: &"Filesystem",
DataSource: nil,
}
Here the link to the offending template, it does not set volumeName
(which is correct).
@stefanprodan
Hi
We have same issue we are using latest version
We are using “v0.27.2"
K8s version 1.21.1
Can you advise please?
`flux-system xxx False Service/xxx/xxx-manager-metrics-service dry-run failed, reason: Invalid, error: Service "xxx-manager-metrics-service" is invalid: [spec.clusterIPs[0]: Invalid value: []string(nil): primary clusterIP can not be unset, spec.ipFamilies[0]: Invalid value: []core.IPFamily(nil): primary ipFamily can not be unset]... 126d
flux-system yyy False Service/yyy/yyy dry-run failed, reason: Invalid, error: Service "yyy" is invalid: [spec.clusterIPs[0]: Invalid value: []string(nil): primary clusterIP can not be unset, spec.ipFamilies[0]: Invalid value: []core.IPFamily(nil): primary ipFamily can not be unset]... 126d
flux-system zzz False Service/zzz/zzz-controller-metrics dry-run failed, reason: Invalid, error: Service "zzz-controller-metrics" is invalid: spec.clusterIPs[0]: Invalid value: []string(nil): primary clusterIP can not be unset...
flux-system www False Service/www/www-webhook-service dry-run failed, reason: Invalid, error: Service "www-webhook-service" is invalid: [spec.clusterIPs[0]: Invalid value: []string(nil): primary clusterIP can not be unset, spec.ipFamilies[0]: Invalid value: []core.IPFamily(nil): primary ipFamily can not be unset]...`