cozystack icon indicating copy to clipboard operation
cozystack copied to clipboard

[Bug] Volume expansion error in the tenant clusters

Open kinseii opened this issue 1 month ago • 3 comments

Describe the bug

Let's create a PVC in a tenant cluster:

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: debug-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Mi

Once the PVC has been created, let's try to expand it:

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: debug-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 20Mi

We will get an error:

$ kc describe pvc debug-pvc -n default
Name:          debug-pvc
Namespace:     default
StorageClass:  kubevirt
Status:        Bound
Volume:        pvc-05710dcd-30e1-4722-8fdb-5cbad98344b5
Labels:        <none>
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: csi.kubevirt.io
               volume.kubernetes.io/storage-provisioner: csi.kubevirt.io
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      10Mi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       <none>
Conditions:
  Type                    Status  LastProbeTime                     LastTransitionTime                Reason  Message
  ----                    ------  -----------------                 ------------------                ------  -------
  Resizing                True    Mon, 01 Jan 0001 00:00:00 +0000   Thu, 20 Nov 2025 16:35:51 -0800
  ControllerResizeError   True    Mon, 01 Jan 0001 00:00:00 +0000   Thu, 20 Nov 2025 16:35:47 -0800           failed to expand pvc with rpc error: code = Internal desc = Failed to expand PVC pvc-05710dcd-30e1-4722-8fdb-5cbad98344b5: PersistentVolumeClaim "pvc-05710dcd-30e1-4722-8fdb-5cbad98344b5" is invalid: spec: Forbidden: spec is immutable after creation except resources.requests and volumeAttributesClassName for bound claims
  core.PersistentVolumeClaimSpec{
    AccessModes: {"ReadWriteMany"},
    Selector:    nil,
    Resources: core.VolumeResourceRequirements{
      Limits: nil,
-     Requests: core.ResourceList{
-       s"storage": {i: resource.int64Amount{value: 10485760}, s: "10485760", Format: "DecimalSI"},
-     },
+     Requests: core.ResourceList{
+       s"storage": {i: resource.int64Amount{value: 20971520}, s: "20971520", Format: "DecimalSI"},
+     },
    },
    VolumeName:       "",
    StorageClassName: &"replicated",
    ... // 4 identical fields
  }

  FileSystemResizePending   True   Mon, 01 Jan 0001 00:00:00 +0000   Thu, 20 Nov 2025 16:35:53 -0800      Waiting for user to (re-)start a pod to finish file system resize of volume on node.
Events:
  Type     Reason                 Age                From                                                                                                                 Message
  ----     ------                 ----               ----                                                                                                                 -------
  Normal   ExternalProvisioning   14m                persistentvolume-controller                                                                                          Waiting for a volume to be created either by the external provisioner 'csi.kubevirt.io' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.
  Normal   Provisioning           14m                csi.kubevirt.io_kubernetes-qwerty-xxx-xxx-xxx-1-kcsi-controller-c9c88ccdc-bzkl7_e2d6b40a-43c1-406a-a05e-2bec666f5acb  External provisioner is provisioning volume for claim "default/debug-pvc"
  Normal   ProvisioningSucceeded  14m                csi.kubevirt.io_kubernetes-qwerty-xxx-xxx-xxx-1-kcsi-controller-c9c88ccdc-bzkl7_e2d6b40a-43c1-406a-a05e-2bec666f5acb  Successfully provisioned volume pvc-05710dcd-30e1-4722-8fdb-5cbad98344b5
  Normal   ExternalExpanding      14m                volume_expand                                                                                                        waiting for an external controller to expand this PVC
  Warning  VolumeResizeFailed     14m (x3 over 14m)  external-resizer csi.kubevirt.io                                                                                     resize volume "pvc-05710dcd-30e1-4722-8fdb-5cbad98344b5" by resizer "csi.kubevirt.io" failed: rpc error: code = Internal desc = Failed to expand PVC pvc-05710dcd-30e1-4722-8fdb-5cbad98344b5: PersistentVolumeClaim "pvc-05710dcd-30e1-4722-8fdb-5cbad98344b5" is invalid: spec: Forbidden: spec is immutable after creation except resources.requests and volumeAttributesClassName for bound claims
  core.PersistentVolumeClaimSpec{
    AccessModes: {"ReadWriteMany"},
    Selector:    nil,
    Resources: core.VolumeResourceRequirements{
      Limits: nil,
-     Requests: core.ResourceList{
-       s"storage": {i: resource.int64Amount{value: 10485760}, s: "10485760", Format: "DecimalSI"},
-     },
+     Requests: core.ResourceList{
+       s"storage": {i: resource.int64Amount{value: 20971520}, s: "20971520", Format: "DecimalSI"},
+     },
    },
    VolumeName:       "",
    StorageClassName: &"replicated",
    ... // 4 identical fields
  }
  Normal  Resizing                  14m (x4 over 14m)  external-resizer csi.kubevirt.io  External resizer is resizing volume pvc-05710dcd-30e1-4722-8fdb-5cbad98344b5
  Normal  FileSystemResizeRequired  14m                external-resizer csi.kubevirt.io  Require file system resize of volume on node

Environment

  • Cozystack version: v0.37.7
  • Provider: on-prem

kinseii avatar Nov 21 '25 01:11 kinseii

The error you're seeing—spec is immutable after creation except resources.requests and volumeAttributesClassName for bound claims—means that during the PVC expansion, something (likely the CSI external-resizer or a sync process between tenant and management clusters) is trying to change immutable fields on the PVC, such as accessModes or storageClassName. In Kubernetes, only resources.requests.storage and volumeAttributesClassName can be changed after a PVC is bound; all other fields are locked in place and any attempt to change them will trigger this error.

Your logs show that accessModes is being changed from ReadWriteOnce to ReadWriteMany and storageClassName from kubevirt to replicated during the expansion attempt, which is not allowed. This suggests that either the KubeVirt CSI resizer or the logic that syncs PVCs between the tenant and management clusters is inadvertently modifying these fields during the resize operation, rather than only updating the resources.requests.storage field as required by Kubernetes API semantics.

The KubeVirt CSI driver and its resizer sidecar are configured to allow volume expansion, and the RBAC permissions are set up to permit the necessary updates to PVCs and PVs. However, if any part of the expansion workflow attempts to patch immutable fields, the Kubernetes API will reject the request with exactly the error you're seeing. This is enforced at the API level and cannot be bypassed by the CSI driver or any controller logic1.

To resolve this, check for any custom logic or sync controllers that might be copying or reconciling the entire PVC spec (instead of just the allowed fields) between clusters during expansion. The expansion should only update the resources.requests.storage field on the bound PVC. If you find such logic, it should be updated to avoid changing immutable fields during expansion.

If you need to work around this in the short term, ensure that the PVC's accessModes and storageClassName remain unchanged during the resize operation, and only the resources.requests.storage field is modified.

If you want to dig deeper, enabling verbose logging on the CSI resizer and related controllers (they are already set to --v=5 in the deployment) may help pinpoint where the unwanted spec changes are coming from.


To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other  Chat with Dosu Join Discord Share on X

dosubot[bot] avatar Nov 21 '25 01:11 dosubot[bot]

Thanks for the report!

It seems requires storageclass to specify allowVolumeExpansion: true parameter

kvaps avatar Nov 27 '25 17:11 kvaps

@kinseii could you please check if your storageclass has this parameter specified

https://github.com/cozystack/cozystack/blob/9d1fb4ccf2412d725cafaae3387612086e3faa6b/packages/system/kubevirt-csi-node/templates/deploy.yaml#L276

kvaps avatar Nov 27 '25 18:11 kvaps

@kinseii could you please check if your storageclass has this parameter specified

cozystack/packages/system/kubevirt-csi-node/templates/deploy.yaml

Line 276 in 9d1fb4c

allowVolumeExpansion: true

Hi! Yes, we have it both - on the tenant and the root clusters. Tenant cluster:

$ kc get sc -A
NAME                 PROVISIONER       RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
kubevirt (default)   csi.kubevirt.io   Delete          Immediate           true                   28d

$ kc get sc kubevirt -o yaml
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    meta.helm.sh/release-name: csi
    meta.helm.sh/release-namespace: cozy-csi
    storageclass.kubernetes.io/is-default-class: "true"
  creationTimestamp: "2025-10-31T00:24:58Z"
  labels:
    app.kubernetes.io/managed-by: Helm
    helm.toolkit.fluxcd.io/name: kubernetes-qwerty-xxxx-xxx-xx-1-csi
    helm.toolkit.fluxcd.io/namespace: tenant-qwerty
  name: kubevirt
  resourceVersion: "189462413"
  uid: ef5b062a-af59-4556-9163-8ec85b65d861
parameters:
  bus: scsi
  infraStorageClassName: replicated
provisioner: csi.kubevirt.io
reclaimPolicy: Delete
volumeBindingMode: Immediate

$ kc describe sc kubevirt
Name:                  kubevirt
IsDefaultClass:        Yes
Annotations:           meta.helm.sh/release-name=csi,meta.helm.sh/release-namespace=cozy-csi,storageclass.kubernetes.io/is-default-class=true
Provisioner:           csi.kubevirt.io
Parameters:            bus=scsi,infraStorageClassName=replicated
AllowVolumeExpansion:  True
MountOptions:          <none>
ReclaimPolicy:         Delete
VolumeBindingMode:     Immediate
Events:                <none>

Root cluster:

# kc get sc replicated
NAME                   PROVISIONER              RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
replicated (default)   linstor.csi.linbit.com   Delete          Immediate           true                   118d

# kc get sc replicated -o yaml
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"allowVolumeExpansion":true,"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"},"name":"replicated"},"parameters":{"linstor.csi.linbit.com/allowRemoteVolumeAccess":"true","resourceGroup":"replicated-raidz1-nvme"},"provisioner":"linstor.csi.linbit.com","volumeBindingMode":"Immediate"}
    storageclass.kubernetes.io/is-default-class: "true"
  creationTimestamp: "2025-08-01T23:43:20Z"
  name: replicated
  resourceVersion: "48912904"
  uid: 07d382b1-5f3e-4878-9840-3f742fb31fe3
parameters:
  linstor.csi.linbit.com/allowRemoteVolumeAccess: "true"
  resourceGroup: replicated-raidz1-nvme
provisioner: linstor.csi.linbit.com
reclaimPolicy: Delete
volumeBindingMode: Immediate

# kc describe sc replicated
Name:            replicated
IsDefaultClass:  Yes
Annotations:     kubectl.kubernetes.io/last-applied-configuration={"allowVolumeExpansion":true,"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"},"name":"replicated"},"parameters":{"linstor.csi.linbit.com/allowRemoteVolumeAccess":"true","resourceGroup":"replicated-raidz1-nvme"},"provisioner":"linstor.csi.linbit.com","volumeBindingMode":"Immediate"}
,storageclass.kubernetes.io/is-default-class=true
Provisioner:           linstor.csi.linbit.com
Parameters:            linstor.csi.linbit.com/allowRemoteVolumeAccess=true,resourceGroup=replicated-raidz1-nvme
AllowVolumeExpansion:  True
MountOptions:          <none>
ReclaimPolicy:         Delete
VolumeBindingMode:     Immediate
Events:                <none>

kinseii avatar Nov 28 '25 13:11 kinseii