cloud-provider-openstack icon indicating copy to clipboard operation
cloud-provider-openstack copied to clipboard

[cinder-csi-plugin] PVC resize not reflecting in pod filesystem

Open kpauljoseph opened this issue 2 years ago • 35 comments

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

kind bug

/kind feature

What happened: Edited PersistentVolumeClaim to increase its size. The PVC and underlying PV gets updated with the new size. But, exec-ing into pod and checking for disk space($ df -h) shows the old value. And the only way to get the file system size updated is to login to the node and manually do resize2fs for that specific block.

What you expected to happen: Size needs to get updated with new value.

How to reproduce it:

  1. Create a PVC.
  2. Exec into pod and check size.
  3. Edit PVC with new larger size.
  4. Exec into pod and check size. It will show the old value.

Anything else we need to know?: This seems to happen only when using cinder csi storage backend in my cluster. Have a few other types of storage classes in the same cluster and they all seem to work fine. Could see a lot of similar issues across github and redhat's bug pages and some of them seem to have fixes pushed in at some point but the issue is still present.

Environment:

  • openstack-cloud-controller-manager version: 1.25.3
  • OpenStack version: 3.18.0
  • cinder-csi-plugin version1.25.3
  • cloud_provider_openstack_version: 1.25.3
  • kubernetes version: 1.24.2

kpauljoseph avatar Mar 01 '23 13:03 kpauljoseph

this has been reported multiple times, maybe following can be helpful to you https://github.com/kubernetes/cloud-provider-openstack/issues/2059

jichenjc avatar Mar 02 '23 01:03 jichenjc

Hi @jichenjc I encountered a similar issue. When I edited PersistentVolumeClaim to increase its size, the PVC and underlying PV gets updated with the new size. But on the node, the resize failed with an error. Here is the log: On the node: I0301 06:31:59.236703 1 nodeserver.go:541] NodeExpandVolume: called with args {"capacity_range":{"required_bytes":51539607552},"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pv-b2f39bc2-65ac-48bc-8d64-b75e4f826536/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_id":"6ca713bc-7b3f-4485-8810-3deaacafbbd8","volume_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pv-b2f39bc2-65ac-48bc-8d64-b75e4f826536/globalmount"} E0301 06:31:59.308265 1 utils.go:92] [ID:902218] GRPC error: rpc error: code = Internal desc = Failed to find mount file system /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pv-b2f39bc2-65ac-48bc-8d64-b75e4f826536/globalmount: executable file not found in $PATH

The volume is resized OpenStack: ` :~$ openstack volume list |grep 6ca713bc-7b3f-4485-8810-3deaacafbbd8

| 6ca713bc-7b3f-4485-8810-3deaacafbbd8 | pv-b2f39bc2-65ac-48bc-8d64-b75e4f826536 | in-use | 56 | Attached to worker-pool1 on /dev/vdg `

Rico556 avatar Mar 02 '23 02:03 Rico556

error: rpc error: code = Internal desc = Failed to find mount file system /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pv-b2f39bc2-65ac-48bc-8d64-b75e4f826536/globalmount: executable file not found in $PATH

didn't enounter this before.. executable file not found in $PATH seems telling us something not found, maybe you can check what's missing?

jichenjc avatar Mar 02 '23 02:03 jichenjc

yes, but I don't know what file it is going to execute. I have already added "resize2fs mkfs' on the node, what more commands it needs to execute during the resizing process?

Rico556 avatar Mar 02 '23 02:03 Rico556

the failed function is

func (m *Mount) GetMountFs(volumePath string) ([]byte, error) {
        args := []string{"-o", "source", "--first-only", "--noheadings", "--target", volumePath}
        return m.BaseMounter.Exec.Command("findmnt", args...).CombinedOutput()
}

maybe you can check from here.. or add some logs?

jichenjc avatar Mar 02 '23 02:03 jichenjc

The log loop output: I0301 06:31:58.221974 1 nodeserver.go:497] NodeGetVolumeStats: called with args {"volume_id":"4539b22a-726e-42dc-8d4a-f4f4b9e2c325","volume_path":"/var/lib/kubelet/pods/dbb4b4cd-1c2a-476f-9502-4a1cd6bc3bd2/volumes/kubernetes.io~csi/pv-d7c1f72a-b90f-4242-9ba3-0f74a470cc84/mount"} I0301 06:31:58.284863 1 utils.go:88] [ID:902211] GRPC call: /csi.v1.Node/NodeGetCapabilities I0301 06:31:58.305252 1 utils.go:88] [ID:902212] GRPC call: /csi.v1.Node/NodeGetCapabilities I0301 06:31:58.306339 1 utils.go:88] [ID:902213] GRPC call: /csi.v1.Node/NodeGetCapabilities I0301 06:31:58.315955 1 utils.go:88] [ID:902214] GRPC call: /csi.v1.Node/NodeStageVolume I0301 06:31:58.318144 1 nodeserver.go:352] NodeStageVolume: called with args {"publish_context":{"DevicePath":"/dev/vdv"},"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pv-b2f39bc2-65ac-48bc-8d64-b75e4f826536/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"storage.kubernetes.io/csiProvisionerIdentity":"1670494123279-8081-cinder.csi.openstack.org"},"volume_id":"6ca713bc-7b3f-4485-8810-3deaacafbbd8"} I0301 06:31:59.157976 1 mount.go:172] Found disk attached as "virtio-6ca713bc-7b3f-4485-8"; full devicepath: /dev/disk/by-id/virtio-6ca713bc-7b3f-4485-8 I0301 06:31:59.162171 1 mount_linux.go:487] Attempting to determine if disk "/dev/disk/by-id/virtio-6ca713bc-7b3f-4485-8" is formatted using blkid with args: ([-p -s TYPE -s PTTYPE -o export /dev/disk/by-id/virtio-6ca713bc-7b3f-4485-8]) I0301 06:31:59.179612 1 mount_linux.go:490] Output: "DEVNAME=/dev/disk/by-id/virtio-6ca713bc-7b3f-4485-8\nTYPE=ext4\n" I0301 06:31:59.179634 1 mount_linux.go:376] Checking for issues with fsck on disk: /dev/disk/by-id/virtio-6ca713bc-7b3f-4485-8 I0301 06:31:59.209594 1 mount_linux.go:477] Attempting to mount disk /dev/disk/by-id/virtio-6ca713bc-7b3f-4485-8 in ext4 format at /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pv-b2f39bc2-65ac-48bc-8d64-b75e4f826536/globalmount I0301 06:31:59.209644 1 mount_linux.go:183] Mounting cmd (mount) with arguments (-t ext4 -o defaults /dev/disk/by-id/virtio-6ca713bc-7b3f-4485-8 /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pv-b2f39bc2-65ac-48bc-8d64-b75e4f826536/globalmount) I0301 06:31:59.223029 1 utils.go:88] [ID:902215] GRPC call: /csi.v1.Node/NodeGetCapabilities I0301 06:31:59.223949 1 utils.go:88] [ID:902216] GRPC call: /csi.v1.Node/NodeGetCapabilities I0301 06:31:59.224606 1 utils.go:88] [ID:902217] GRPC call: /csi.v1.Node/NodeGetCapabilities I0301 06:31:59.236676 1 utils.go:88] [ID:902218] GRPC call: /csi.v1.Node/NodeExpandVolume I0301 06:31:59.236703 1 nodeserver.go:541] NodeExpandVolume: called with args {"capacity_range":{"required_bytes":51539607552},"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pv-b2f39bc2-65ac-48bc-8d64-b75e4f826536/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_id":"6ca713bc-7b3f-4485-8810-3deaacafbbd8","volume_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pv-b2f39bc2-65ac-48bc-8d64-b75e4f826536/globalmount"} E0301 06:31:59.308265 1 utils.go:92] [ID:902218] GRPC error: rpc error: code = Internal desc = Failed to find mount file system /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pv-b2f39bc2-65ac-48bc-8d64-b75e4f826536/globalmount: executable file not found in $PATH I0301 06:31:59.900837 1 utils.go:88] [ID:902219] GRPC call: /csi.v1.Node/NodeGetCapabilities I0301 06:31:59.925454 1 utils.go:88] [ID:902220] GRPC call: /csi.v1.Node/NodeGetCapabilities I0301 06:31:59.926046 1 utils.go:88] [ID:902221] GRPC call: /csi.v1.Node/NodeGetCapabilities I0301 06:31:59.926603 1 utils.go:88] [ID:902222] GRPC call: /csi.v1.Node/NodeStageVolume

Rico556 avatar Mar 02 '23 03:03 Rico556

I resized the pvc from 32G to 56G

~ # df -h |grep pv-b2f39bc2-65ac-48bc-8d64-b75e4f826536
/dev/vdg         32G   32G     0 100% /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pv-b2f39bc2-65ac-48bc-8d64-b75e4f826536/globalmount
:~ # fdisk /dev/vdg

Welcome to fdisk (util-linux 2.36.2).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

The device contains 'ext4' signature and it will be removed by a write command. See fdisk(8) man page and --wipe option for more details.

Device does not contain a recognized partition table.
Created a new DOS disklabel with disk identifier 0x45ae7994.

Command (m for help): p
Disk /dev/vdg: 56 GiB, 60129542144 bytes, 117440512 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x45ae7994

Command (m for help): F
Unpartitioned space /dev/vdg: 56 GiB, 60128493568 bytes, 117438464 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes

Start       End   Sectors Size
 2048 117440511 117438464  56G

Rico556 avatar Mar 02 '23 05:03 Rico556

this has been reported multiple times, maybe following can be helpful to you #2059

Hi @jichenjc I followed that troubleshooting guide and also looked at other similar issues which hinted that it could be an openstack environment problem, and tried going through the openstack logs but couldn't find any related error messages.

I then went through the cinder csi related container logs and found one specific line I0301 10:32:15.767556 1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"kube-system", Name:"lcm-container-registry", UID:"a7effd8f-3b4f-3d86-8c5e-bf8a9bde359d", APIVersion:"v1", ResourceVersion:"7493038", FieldPath:""}): type: 'Normal' reason: 'FileSystemResizeRequired' Require file system resize of volume on node

Does this mean that we need to manually resize this everytime? Although, I couldn't see this as an error in the logs and events of the pod to which this pvc is tied to.

I tracked a few more related logs from around the same time across csi cinder related pods:

$ kubectl logs -n kube-system csi-cinder-controllerplugin-7b66gg465d-kl8n4 csi-resizer

I0301 10:32:14.282683       1 controller.go:291] Started PVC processing "kube-system/lcm-container-registry"
I0301 10:32:14.317253       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"kube-system", Name:"lcm-container-registry", UID:"a7effd8f-3b4f-3d86-8c5e-bf8a9bde359d", APIVersion:"v1", ResourceVersion:"7493038", FieldPath:""}): type: 'Normal' reason: 'Resizing' External resizer is resizing volume k8s-stack-a7effd8f-3b4f-3d86-8c5e-bf8a9bde359d
I0301 10:32:15.751642       1 controller.go:468] Resize volume succeeded for volume "k8s-stack-a7effd8f-3b4f-3d86-8c5e-bf8a9bde359d", start to update PV's capacity
I0301 10:32:15.751674       1 controller.go:570] Resize volume succeeded for volume "k8s-stack-a7effd8f-3b4f-3d86-8c5e-bf8a9bde359d", start to update PV's capacity
I0301 10:32:15.761160       1 controller.go:474] Update capacity of PV "k8s-stack-a7effd8f-3b4f-3d86-8c5e-bf8a9bde359d" to 24Gi succeeded
I0301 10:32:15.767483       1 controller.go:496] Mark PVC "kube-system/lcm-container-registry" as file system resize required
I0301 10:32:15.767527       1 controller.go:291] Started PVC processing "kube-system/lcm-container-registry"
I0301 10:32:15.767539       1 controller.go:338] No need to resize PV "k8s-stack-a7effd8f-3b4f-3d86-8c5e-bf8a9bde359d"
I0301 10:32:15.767556       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"kube-system", Name:"lcm-container-registry", UID:"a7effd8f-3b4f-3d86-8c5e-bf8a9bde359d", APIVersion:"v1", ResourceVersion:"7493038", FieldPath:""}): type: 'Normal' reason: 'FileSystemResizeRequired' Require file system resize of volume on node

$ kubectl logs -n kube-system csi-cinder-controllerplugin-7b66gg465d-kl8n4 cinder-csi-plugin

I0301 10:32:12.620718       1 utils.go:88] [ID:51588] GRPC call: /csi.v1.Controller/ListVolumes
I0301 10:32:14.328805       1 utils.go:88] [ID:51589] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0301 10:32:14.331517       1 utils.go:88] [ID:51590] GRPC call: /csi.v1.Controller/ControllerExpandVolume
I0301 10:32:14.337582       1 controllerserver.go:595] ControllerExpandVolume: called with args {"capacity_range":{"required_bytes":25769803776},"volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_id":"22d86e37-21d1-3125-1234-2b4e6f92e2d9"}
I0301 10:32:15.751369       1 controllerserver.go:644] ControllerExpandVolume resized volume 22d86e37-21d1-3125-1234-2b4e6f92e2d9 to size 24

$ kubectl logs csi-cinder-nodeplugin-mbwp4 -n kube-system cinder-csi-plugin

I0301 11:41:57.748435       1 utils.go:88] [ID:152499] GRPC call: /csi.v1.Node/NodeStageVolume
I0301 11:41:57.748459       1 nodeserver.go:352] NodeStageVolume: called with args {"publish_context":{"DevicePath":"/dev/vdb"},"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/k8s-stack-a7effd8f-3b4f-3d86-8c5e-bf8a9bde359d/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"storage.kubernetes.io/csiProvisionerIdentity":"1676108077029-8081-cinder.csi.openstack.org"},"volume_id":"22d86e37-21d1-3125-1234-2b4e6f92e2d9"}
I0301 11:41:58.518221       1 mount.go:172] Found disk attached as "virtio-22d86e37-21d1-4589-8"; full devicepath: /dev/disk/by-id/virtio-22d86e37-21d1-4589-8
I0301 11:41:58.518279       1 mount_linux.go:487] Attempting to determine if disk "/dev/disk/by-id/virtio-22d86e37-21d1-4589-8" is formatted using blkid with args: ([-p -s TYPE -s PTTYPE -o export /dev/disk/by-id/virtio-22d86e37-21d1-4589-8])
I0301 11:41:58.534191       1 mount_linux.go:490] Output: "DEVNAME=/dev/disk/by-id/virtio-22d86e37-21d1-4589-8\nTYPE=ext4\n"
I0301 11:41:58.534215       1 mount_linux.go:376] Checking for issues with fsck on disk: /dev/disk/by-id/virtio-22d86e37-21d1-4589-8
I0301 11:41:58.562693       1 mount_linux.go:477] Attempting to mount disk /dev/disk/by-id/virtio-22d86e37-21d1-4589-8 in ext4 format at /var/lib/kubelet/plugins/kubernetes.io/csi/pv/k8s-stack-a7effd8f-3b4f-3d86-8c5e-bf8a9bde359d/globalmount
I0301 11:41:58.562738       1 mount_linux.go:183] Mounting cmd (mount) with arguments (-t ext4 -o defaults /dev/disk/by-id/virtio-22d86e37-21d1-4589-8 /var/lib/kubelet/plugins/kubernetes.io/csi/pv/k8s-stack-a7effd8f-3b4f-3d86-8c5e-bf8a9bde359d/globalmount)
I0301 11:41:58.576616       1 utils.go:88] [ID:152500] GRPC call: /csi.v1.Node/NodeGetCapabilities
I0301 11:41:58.579632       1 utils.go:88] [ID:152501] GRPC call: /csi.v1.Node/NodeGetCapabilities
I0301 11:41:58.580290       1 utils.go:88] [ID:152502] GRPC call: /csi.v1.Node/NodeGetCapabilities
I0301 11:41:58.581136       1 utils.go:88] [ID:152503] GRPC call: /csi.v1.Node/NodePublishVolume
I0301 11:41:58.581150       1 nodeserver.go:51] NodePublishVolume: called with args {"publish_context":{"DevicePath":"/dev/vdb"},"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/k8s-stack-a7effd8f-3b4f-3d86-8c5e-bf8a9bde359d/globalmount","target_path":"/var/lib/kubelet/pods/b04d239f-b1c5-4ac7-8c3b-04ce8257f5f5/volumes/kubernetes.io~csi/k8s-stack-a7effd8f-3b4f-3d86-8c5e-bf8a9bde359d/mount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"csi.storage.k8s.io/ephemeral":"false","csi.storage.k8s.io/pod.name":"lcm-container-registry-registry-6d4f6c46bd-4hrpw","csi.storage.k8s.io/pod.namespace":"kube-system","csi.storage.k8s.io/pod.uid":"b04d239f-b1c5-4ac7-8c3b-04ce8257f5f5","csi.storage.k8s.io/serviceAccount.name":"lcm-container-registry-registry","storage.kubernetes.io/csiProvisionerIdentity":"1676108077029-8081-cinder.csi.openstack.org"},"volume_id":"22d86e37-21d1-3125-1234-2b4e6f92e2d9"}
I0301 11:41:58.656960       1 mount_linux.go:183] Mounting cmd (mount) with arguments (-t ext4 -o bind /var/lib/kubelet/plugins/kubernetes.io/csi/pv/k8s-stack-a7effd8f-3b4f-3d86-8c5e-bf8a9bde359d/globalmount /var/lib/kubelet/pods/b04d239f-b1c5-4ac7-8c3b-04ce8257f5f5/volumes/kubernetes.io~csi/k8s-stack-a7effd8f-3b4f-3d86-8c5e-bf8a9bde359d/mount)
I0301 11:41:58.658532       1 mount_linux.go:183] Mounting cmd (mount) with arguments (-t ext4 -o bind,remount,rw /var/lib/kubelet/plugins/kubernetes.io/csi/pv/k8s-stack-a7effd8f-3b4f-3d86-8c5e-bf8a9bde359d/globalmount /var/lib/kubelet/pods/b04d239f-b1c5-4ac7-8c3b-04ce8257f5f5/volumes/kubernetes.io~csi/k8s-stack-a7effd8f-3b4f-3d86-8c5e-bf8a9bde359d/mount)

kpauljoseph avatar Mar 02 '23 12:03 kpauljoseph

@kpauljoseph can you share your PVC storage class details? e.g.

$ kubectl get sc `kubectl get pvc my-pvc -o json | jq -r '.spec.storageClassName'` -o yaml

should contain allowVolumeExpansion: true

kayrus avatar Mar 02 '23 13:03 kayrus

@kayrus it's enabled

$ kubectl get storageclasses.storage.k8s.io
NAME                      PROVISIONER                RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
network-block (default)   cinder.csi.openstack.org   Delete          Immediate           true                   21d

$ kubectl describe storageclasses.storage.k8s.io network-block
Name:                  network-block
IsDefaultClass:        Yes
Annotations:           storageclass.kubernetes.io/is-default-class=true
Provisioner:           cinder.csi.openstack.org
Parameters:            availability=nova,csi.storage.k8s.io/fstype=ext4
AllowVolumeExpansion:  True
MountOptions:          <none>
ReclaimPolicy:         Delete
VolumeBindingMode:     Immediate
Events:                <none>

$ kubectl get sc `kubectl get pvc lcm-container-registry -n kube-system -o json | jq -r '.spec.storageClassName'` -o yaml
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
  creationTimestamp: "2023-02-09T12:35:35Z"
  name: network-block
  resourceVersion: "1470"
  selfLink: /apis/storage.k8s.io/v1/storageclasses/network-block
  uid: b31d8747-cce8-1234-5678-9d1b647b96c3
parameters:
  availability: nova
  csi.storage.k8s.io/fstype: ext4
provisioner: cinder.csi.openstack.org
reclaimPolicy: Delete
volumeBindingMode: Immediate

kpauljoseph avatar Mar 02 '23 17:03 kpauljoseph

@kpauljoseph the log should report NodeExpandVolume: called with args xxx like @Rico556's log shows ,but seems your log didn't have this

I doubt whether it's because resizer didn't call CPO due to your log I0301 10:32:15.767539 1 controller.go:338] No need to resize PV "k8s-stack-a7effd8f-3b4f-3d86-8c5e-bf8a9bde359d

though not sure what happened behind , but technically we should not need manual resize, so I doubt something wrong in @Rico556 's env on this, that's the reason I think maybe findmnt is not installed?

Failed to find mount file system /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pv-b2f39bc2-65ac-48bc-8d64-b75e4f826536/globalmount: executable file not found in $PATH 

jichenjc avatar Mar 03 '23 00:03 jichenjc

hi @jichenjc I checked and findmnt is already installed.

:~ # findmnt --help

Usage:
 findmnt [options]
 findmnt [options] <device> | <mountpoint>
 findmnt [options] <device> <mountpoint>
 findmnt [options] [--source <device>] [--target <path> | --mountpoint <dir>]

Find a (mounted) filesystem.

Rico556 avatar Mar 03 '23 01:03 Rico556

hm. I wonder whether this test is somehow related to this issue: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/cloud-provider-openstack/2140/openstack-cloud-csi-cinder-e2e-test-release-125/1631575594753855488

while waiting for fs resize to finish: error waiting for pvc "inline-volume-tester-2vcb9-my-volume-0" filesystem resize to finish: timed out waiting for the condition

kayrus avatar Mar 03 '23 10:03 kayrus

filesystem resize to finish: timed out waiting for the condition

um.. this didn't report in above logs ,will dig this a little bit ..

jichenjc avatar Mar 06 '23 01:03 jichenjc

Hi @jichenjc , anything update on this issue? Any WA would you suggest ?

dhiman360 avatar Mar 23 '23 04:03 dhiman360

sorry, not yet... too busy in other stuff recently , if anyone has any insight that will be helpful @kayrus @zetaab

jichenjc avatar Mar 24 '23 01:03 jichenjc

@jichenjc Wondering any update on this issue?

dhiman360 avatar Apr 24 '23 05:04 dhiman360

#2059 might be helpful esepcailly the last 2 comments from @seanschneeweiss

jichenjc avatar Apr 24 '23 05:04 jichenjc

Hi @jichenjc. I see that some merges are done for the ticket you mentioned above ( https://bugs.launchpad.net/charm-cinder/+bug/1939389 ). Could that fix the issue, anything may I know when the fix will be available through a release version ?

dhiman360 avatar Jun 05 '23 05:06 dhiman360

Hi @jichenjc ,

As suggested by the ticket https://bugs.launchpad.net/charm-cinder/+bug/1939389 , adding the missing nova section in cinder conf that refers to authentication, we have already used this configurations and it's still not working.

example nova section from cinder conf [nova] interface = internal auth_url = XXXX auth_type = password project_domain_id = default user_domain_id = default region_name = XXXX project_name = service username = nova password = XXXX cafile =

dhiman360 avatar Jun 21 '23 05:06 dhiman360

So, let's summarize what we have. You trigger a PVC resize, OpenStack API shows that the volume size has been increased, but the pod with the corresponding PVC doesn't show the expected capacity, right?

  • Do logs contain strings starting with NodeExpandVolume: called with args ...?
  • Have you tried to trigger resize2fs /dev... directly from the pod (your pod should have be privileged)?
  • What does lsblk say?
  • What kind of hypervisor do you use for VMs?

kayrus avatar Jun 21 '23 07:06 kayrus

Answers:

So, let's summarize what we have. You trigger a PVC resize, OpenStack API shows that the volume size has been increased, but the pod with the corresponding PVC doesn't show the expected capacity, right? --> Here is what we did. Edited the PVC with new larger size from 32Gi to 48Gi ("kubectl get pvc -n kube-system eric-lcm-container-registry") . After this operation, we found pvc has been extended to 48Gi, but the file system size is still 32Gi in pod.

2-24-0-rel:~> kubectl get pvc -n kube-system eric-lcm-container-registry -oyaml | grep storage volume.beta.kubernetes.io/storage-provisioner: cinder.csi.openstack.org volume.kubernetes.io/storage-provisioner: cinder.csi.openstack.org storage: 48G storageClassName: network-block storage: 46875000Ki

2-24-0-rel: kubectl exec -it -n kube-system eric-lcm-container-registry-registry-c98794fdf-dzgxg sh kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead. Defaulted container "registry" out of: registry, nginx-tls-terminator, sidecar sh-4.4$ df -h Filesystem Size Used Avail Use% Mounted on overlay 24G 11G 14G 43% / tmpfs 64M 0 64M 0% /dev tmpfs 2.0G 0 2.0G 0% /sys/fs/cgroup /dev/vda3 24G 11G 14G 43% /etc/hosts shm 64M 0 64M 0% /dev/shm tmpfs 3.8G 4.0K 3.8G 1% /etc/docker/registry /dev/vdb 32G 28K 32G 1% /var/lib/registry tmpfs 2.0G 0 2.0G 0% /proc/acpi tmpfs 2.0G 0 2.0G 0% /proc/scsi tmpfs 2.0G 0 2.0G 0% /sys/firmware sh-4.4$

After that logged into the node and found: worker-2-24-0-rel: lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sr0 11:0 1 678K 0 rom vda 252:0 0 24G 0 disk ├─vda1 │ 252:1 0 2M 0 part ├─vda2 │ 252:2 0 33M 0 part /boot/efi └─vda3 252:3 0 24G 0 part /var/lib/kubelet/pods/adf25f71-50b8-49b4-bf53-d84217c0bf44/volume-subpaths/sidecar-config/sidecar/0 /var/lib/kubelet/pods/adf25f71-50b8-49b4-bf53-d84217c0bf44/volume-subpaths/registry-config/registry/1 /opt/cni / vdb 252:16 0 32G 0 disk /var/lib/kubelet/pods/adf25f71-50b8-49b4-bf53-d84217c0bf44/volume-subpaths/ezghodh-2-24-0-rel-8a69de50-8823-4ede-8cc6-1a412f07bfee/registry/0 /var/lib/kubelet/pods/adf25f71-50b8-49b4-bf53-d84217c0bf44/volumes/kubernetes.io~csi/ezghodh-2-24-0-rel-8a69de50-8823-4ede-8cc6-1a412f07bfee/mount /var/lib/kubelet/plugins/kubernetes.io/csi/cinder.csi.openstack.org/64bc60a9af910b155eb6efbcd8f07437ecf3a1b1103e42107c8a72ad302bfc7e/globalmount

Do logs contain strings starting with NodeExpandVolume: called with args ...? --> Yes, found in the node logs as below.

worker-2-24-0-rel: journalctl | grep NodeExpandVolume Jun 22 06:01:52 worker-pool1-u7zw0i3p-ezghodh-2-24-0-rel kubelet[4458]: I0622 06:01:52.621220 4458 operation_generator.go:2217] "MountVolume.NodeExpandVolume succeeded for volume "ezghodh-2-24-0-rel-8a69de50-8823-4ede-8cc6-1a412f07bfee" (UniqueName: "kubernetes.io/csi/cinder.csi.openstack.org^4b366cb8-5503-4769-80f2-0506cb7ad5f5") pod "eric-lcm-container-registry-registry-c98794fdf-dzgxg" (UID: "adf25f71-50b8-49b4-bf53-d84217c0bf44") worker-2-24-0-rel" pod="kube-system/eric-lcm-container-registry-registry-c98794fdf-dzgxg" worker-2-24-0-rel:

What does lsblk say? --> Listed above

What kind of hypervisor do you use for VMs? --> worker-2-24-0-rel: sudo dmidecode | grep -i -e manufacturer -e product -e vendor Vendor: SeaBIOS Manufacturer: OpenStack Foundation Product Name: OpenStack Nova Manufacturer: QEMU Manufacturer: QEMU Manufacturer: QEMU Manufacturer: QEMU Manufacturer: QEMU Manufacturer: QEMU

dhiman360 avatar Jun 22 '23 06:06 dhiman360

@dhiman360 can you also share the CSI driver version you're using? Is it still 1.25.3? It would be also nice to have at least --v=5 cinder CSI nodeserver logs containing NodeExpandVolume string in.

Please share the output of the kubectl get sc network-block -o yaml.

In addition can you run these two commands manually on the host node using root and see whether lsblk shows the correct size?

  • udevadm trigger
  • for i in /sys/class/scsi_host/*/scan; do echo '- - -' > $i; done

P.S. Please use markdown formatting to highlight the logs/cli output.

kayrus avatar Jun 22 '23 08:06 kayrus

Hi @kayrus Don't see any changes:

2-24-0-rel: kubectl get sc network-block -o yaml
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
  creationTimestamp: "2023-03-29T16:43:59Z"
  name: network-block
  resourceVersion: "2536"
  uid: 9d5a23a7-863e-4e63-99a2-a953fa876cfe
parameters:
  availability: nova
  csi.storage.k8s.io/fstype: ext4
provisioner: cinder.csi.openstack.org
reclaimPolicy: Delete
volumeBindingMode: Immediate

worker-2-24-0-rel:/home/eccd # udevadm trigger
worker-2-24-0-rel:/home/eccd # for i in /sys/class/scsi_host/*/scan; do echo '- - -' > $i; done
worker-2-24-0-rel:/home/eccd # lsblk
NAME MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sr0   11:0    1  678K  0 rom
vda  252:0    0   24G  0 disk
├─vda1
│    252:1    0    2M  0 part
├─vda2
│    252:2    0   33M  0 part /boot/efi
└─vda3
     252:3    0   24G  0 part /var/lib/kubelet/pods/adf25f71-50b8-49b4-bf53-d84217c0bf44/volume-subpaths/sidecar-config/sidecar/0
                              /var/lib/kubelet/pods/adf25f71-50b8-49b4-bf53-d84217c0bf44/volume-subpaths/registry-config/registry/1
                              /opt/cni
                              /
vdb  252:16   0   32G  0 disk /var/lib/kubelet/pods/adf25f71-50b8-49b4-bf53-d84217c0bf44/volume-subpaths/ezghodh-2-24-0-rel-8a69de50-8823-4ede-8cc6-1a412f07bfee/registry/0
                              /var/lib/kubelet/pods/adf25f71-50b8-49b4-bf53-d84217c0bf44/volumes/kubernetes.io~csi/ezghodh-2-24-0-rel-8a69de50-8823-4ede-8cc6-1a412f07bfee/mount
                              /var/lib/kubelet/plugins/kubernetes.io/csi/cinder.csi.openstack.org/64bc60a9af910b155eb6efbcd8f07437ecf3a1b1103e42107c8a72ad302bfc7e/globalmo
                              unt

dhiman360 avatar Jun 22 '23 08:06 dhiman360

@dhiman360 I'm afraid your OpenStack provider doesn't support online volume expansion. Did you have a chance to clarify this question with your OpenStack cloud admins?

kayrus avatar Jun 22 '23 08:06 kayrus

@kayrus can you please provide specific data that I can share with the admin, that showin it doesn't support online volume expansion. Thanks.

dhiman360 avatar Jun 22 '23 08:06 dhiman360

I think the https://github.com/kubernetes/cloud-provider-openstack/issues/2138#issuecomment-1602230637 comment would be enough. And the output of the openstack volume show %% with the desired volume size.

kayrus avatar Jun 22 '23 09:06 kayrus

@kayrus , I have done a set of tests with PVC increments. Please have a look. It seems if we increase by 8Gi every time, it's working but if the number are not multiples of 8Gi then we have issues, the behaviors are different. Also a big jump from 8Gi to 40Gi also not reflected even it is multiple of 8Gi change.

image

Is it something to do with the following property (sio_round_volume_capacity ) listed? https://docs.openstack.org/cinder/rocky/configuration/block-storage/samples/cinder.conf.html

Round volume sizes up to 8GB boundaries. VxFlex OS/ScaleIO requires volumes to be sized in multiples of 8GB. If set to False, volume creation will fail for volumes not sized properly (boolean value) sio_round_volume_capacity = true

dhiman360 avatar Jun 23 '23 03:06 dhiman360

@kayrus In this execution, although the journalctl log says "MountVolume.NodeExpandVolume succeeded", none of the time it got succeeded, check the lsblk from the node at the end, that also did not change. Details of the test execution:


Original:
=====
master kubectl get pvc -A | grep registry
kube-system   ee-container-registry                                  Bound    dddd-2-26-0-rc3-7190afcf-543c-4a4a-a508-4f19b0ca2006   10Gi       RWO            network-block   10h

master kubectl exec -it -n kube-system ee-container-registry-registry-7787b97786-7p5zd -- df -h
Defaulted container "registry" out of: registry, nginx-tls-terminator, sidecar
Filesystem      Size  Used Avail Use% Mounted on
overlay          24G  9.5G   15G  40% /
tmpfs            64M     0   64M   0% /dev
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/vda3        24G  9.5G   15G  40% /etc/hosts
shm              64M     0   64M   0% /dev/shm
/dev/vdb         16G   28K   16G   1% /var/lib/registry
tmpfs           5.0G  4.0K  5.0G   1% /etc/docker/registry
tmpfs           3.9G     0  3.9G   0% /proc/acpi
tmpfs           3.9G     0  3.9G   0% /proc/scsi
tmpfs           3.9G     0  3.9G   0% /sys/firmware



TRY-1: 10Gi to 24Gi -> FAILED
===================
master kubectl edit pvc -n kube-system   ee-container-registry
persistentvolumeclaim/ee-container-registry edited

master kubectl get pvc -n kube-system   ee-container-registry
NAME                          STATUS   VOLUME                                                    CAPACITY   ACCESS MODES   STORAGECLASS    AGE
ee-container-registry   Bound    dddd-2-26-0-rc3-7190afcf-543c-4a4a-a508-4f19b0ca2006   24Gi       RWO            network-block   10h
master kubectl exec -it -n kube-system ee-container-registry-registry-7787b97786-7p5zd -- df -h
Defaulted container "registry" out of: registry, nginx-tls-terminator, sidecar
Filesystem      Size  Used Avail Use% Mounted on
overlay          24G  9.5G   15G  40% /
tmpfs            64M     0   64M   0% /dev
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/vda3        24G  9.5G   15G  40% /etc/hosts
shm              64M     0   64M   0% /dev/shm
/dev/vdb         16G   28K   16G   1% /var/lib/registry <<----------- Unchanged
tmpfs           5.0G  4.0K  5.0G   1% /etc/docker/registry
tmpfs           3.9G     0  3.9G   0% /proc/acpi
tmpfs           3.9G     0  3.9G   0% /proc/scsi
tmpfs           3.9G     0  3.9G   0% /sys/firmware




TEY-2: 24Gi to 30Gi -> FAILED
===================
master kubectl edit pvc -n kube-system   ee-container-registry
persistentvolumeclaim/ee-container-registry edited
master kubectl get pvc -n kube-system   ee-container-registry
NAME                          STATUS   VOLUME                                                    CAPACITY   ACCESS MODES   STORAGECLASS    AGE
ee-container-registry   Bound    dddd-2-26-0-rc3-7190afcf-543c-4a4a-a508-4f19b0ca2006   30Gi       RWO            network-block   10h
master kubectl exec -it -n kube-system ee-container-registry-registry-7787b97786-7p5zd -- df -h
Defaulted container "registry" out of: registry, nginx-tls-terminator, sidecar
Filesystem      Size  Used Avail Use% Mounted on
overlay          24G  9.5G   15G  40% /
tmpfs            64M     0   64M   0% /dev
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/vda3        24G  9.5G   15G  40% /etc/hosts
shm              64M     0   64M   0% /dev/shm
/dev/vdb         16G   28K   16G   1% /var/lib/registry <<----------- Unchanged
tmpfs           5.0G  4.0K  5.0G   1% /etc/docker/registry
tmpfs           3.9G     0  3.9G   0% /proc/acpi
tmpfs           3.9G     0  3.9G   0% /proc/scsi
tmpfs           3.9G     0  3.9G   0% /sys/firmware




TRY-3: 30Gi to 38Gi -> FAILED
===================
master kubectl edit pvc -n kube-system   ee-container-registry
persistentvolumeclaim/ee-container-registry edited
master kubectl get pvc -n kube-system   ee-container-registry
NAME                          STATUS   VOLUME                                                    CAPACITY   ACCESS MODES   STORAGECLASS    AGE
ee-container-registry   Bound    dddd-2-26-0-rc3-7190afcf-543c-4a4a-a508-4f19b0ca2006   38Gi       RWO            network-block   10h
master kubectl exec -it -n kube-system ee-container-registry-registry-7787b97786-7p5zd -- df -h
Defaulted container "registry" out of: registry, nginx-tls-terminator, sidecar
Filesystem      Size  Used Avail Use% Mounted on
overlay          24G  9.5G   15G  40% /
tmpfs            64M     0   64M   0% /dev
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/vda3        24G  9.5G   15G  40% /etc/hosts
shm              64M     0   64M   0% /dev/shm
/dev/vdb         16G   28K   16G   1% /var/lib/registry <<----------- Unchanged
tmpfs           5.0G  4.0K  5.0G   1% /etc/docker/registry
tmpfs           3.9G     0  3.9G   0% /proc/acpi
tmpfs           3.9G     0  3.9G   0% /proc/scsi
tmpfs           3.9G     0  3.9G   0% /sys/firmware



lsblk at the end.
=================
master ssh 10.0.16.4 lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sr0     11:0    1  678K  0 rom
vda    252:0    0   24G  0 disk
├─vda1 252:1    0    2M  0 part
├─vda2 252:2    0   33M  0 part /boot/efi
└─vda3 252:3    0   24G  0 part /var/lib/kubelet/pods/81c25228-6130-457d-b60c-2336cbca3449/volume-subpaths/sidecar-config/sidecar/0
                                /var/lib/kubelet/pods/81c25228-6130-457d-b60c-2336cbca3449/volume-subpaths/registry-config/registry/1
                                /opt/cni
                                /
vdb    252:16   0   16G  0 disk /var/lib/kubelet/pods/81c25228-6130-457d-b60c-2336cbca3449/volume-subpaths/dddd-2-26-0-rc3-7190afcf-543c-4a4a-a508-4f19b0ca2006/registry/0  <<----------- Unchanged
.                                /var/lib/kubelet/pods/81c25228-6130-457d-b60c-2336cbca3449/volumes/kubernetes.io~csi/dddd-2-26-0-rc3-7190afcf-543c-4a4a-a508-4f19b0ca2006/mount
                                /var/lib/kubelet/plugins/kubernetes.io/csi/cinder.csi.openstack.org/51071c6dbeff83bda6423106eba6d23bcfc8d3657562f46b4db1e0e0b92640e4/globalmount
vdc    252:32   0    8G  0 disk /var/lib/kubelet/pods/663ad9b9-dc6d-4d25-b134-f23850df4181/volumes/kubernetes.io~csi/dddd-2-26-0-rc3-5ae9f1f7-5cbe-4354-9a81-68a95c1cc44f/mount
                                /var/lib/kubelet/plugins/kubernetes.io/csi/cinder.csi.openstack.org/904f93f9fdf99626cf577800c1536e99643e567c7cda061bd7ffb9bf2cef6cf8/globalmount


journalctl logs from the node
=============================
master ssh 10.0.16.4 journalctl | grep NodeExpandVolume
Jun 24 03:11:23 worker kubelet[4781]: I0624 03:11:23.639270    4781 operation_generator.go:2227] "MountVolume.NodeExpandVolume succeeded for volume \"dddd-2-26-0-rc3-7190afcf-543c-4a4a-a508-4f19b0ca2006\" (UniqueName: \"kubernetes.io/csi/cinder.csi.openstack.org^37f48caa-f5ed-4b32-bd21-b24e1d227deb\") pod \"ee-container-registry-registry-7787b97786-7p5zd\" (UID: \"81c25228-6130-457d-b60c-2336cbca3449\") worker" pod="kube-system/ee-container-registry-registry-7787b97786-7p5zd"
Jun 24 03:12:37 worker kubelet[4781]: I0624 03:12:37.741528    4781 operation_generator.go:2227] "MountVolume.NodeExpandVolume succeeded for volume \"dddd-2-26-0-rc3-7190afcf-543c-4a4a-a508-4f19b0ca2006\" (UniqueName: \"kubernetes.io/csi/cinder.csi.openstack.org^37f48caa-f5ed-4b32-bd21-b24e1d227deb\") pod \"ee-container-registry-registry-7787b97786-7p5zd\" (UID: \"81c25228-6130-457d-b60c-2336cbca3449\") worker" pod="kube-system/ee-container-registry-registry-7787b97786-7p5zd"
Jun 24 03:14:03 worker kubelet[4781]: I0624 03:14:03.656787    4781 operation_generator.go:2227] "MountVolume.NodeExpandVolume succeeded for volume \"dddd-2-26-0-rc3-7190afcf-543c-4a4a-a508-4f19b0ca2006\" (UniqueName: \"kubernetes.io/csi/cinder.csi.openstack.org^37f48caa-f5ed-4b32-bd21-b24e1d227deb\") pod \"ee-container-registry-registry-7787b97786-7p5zd\" (UID: \"81c25228-6130-457d-b60c-2336cbca3449\") worker" pod="kube-system/ee-container-registry-registry-7787b97786-7p5zd"

dhiman360 avatar Jun 24 '23 03:06 dhiman360

Issue here is that the expansion code never verifies if disk was expanded. It should always validate the disk geometry is same as requested or bigger and only then return ok. Now that check is only done if rescan-on-resize is enabled. And that only works for iscsi devices.

mape90 avatar Aug 14 '23 12:08 mape90