mount failed on encrypted rbd device with wrong fs type error message
Describe the bug
Mount failed on encrypted RBD device with wrong fs type error message. Issue seems to be intermittent. Pod cannot be started and when describing pod status the following is visible:
Warning FailedMount 108s (x28 over 43m) kubelet MountVolume.MountDevice failed for volume "pvc-cc5127e5-2598-41a9-a742-f51290f28b08" : rpc error: code = Internal desc = mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t ext4 -o _netdev,defaults /dev/mapper/luks-rbd-0001-0009-rook-ceph-0000000000000001-b7c610dc-61bc-4136-b03d-ae5c86fef99b /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/2677416184d1804456c8cda2e754b18d3359f0d524484b48ec7f534aba3fe540/globalmount/0001-0009-rook-ceph-0000000000000001-b7c610dc-61bc-4136-b03d-ae5c86fef99b
Output: mount: /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/2677416184d1804456c8cda2e754b18d3359f0d524484b48ec7f534aba3fe540/globalmount/0001-0009-rook-ceph-0000000000000001-b7c610dc-61bc-4136-b03d-ae5c86fef99b: wrong fs type, bad option, bad superblock on /dev/mapper/luks-rbd-0001-0009-rook-ceph-0000000000000001-b7c610dc-61bc-4136-b03d-ae5c86fef99b, missing codepage or helper program, or other error.
Environment details
-
Image/version of Ceph CSI driver : repository: quay.io/cephcsi/cephcsi tag: v3.13.0
-
Helm chart version : CHART: rook-ceph VERSION: v1.16.0
-
Kernel version : 5.14.21-150500.55.83-default
-
Mounter used for mounting PVC (for cephFS its
fuseorkernel. for rbd itskrbdorrbd-nbd) : -
Kubernetes cluster version : v1.31.1
-
Ceph cluster version : cephVersion: image: quay.io/ceph/ceph:v19.2.0
Steps to reproduce
Steps to reproduce the behavior:
- Setup details: not known exactly, how to reproduce. Issue comes randomly. On the cluster there are continous reinstallation in eric-eea-ns namespace and there are cases when random pod cannot be started due to PVC mount issue. Reported issue comes randomly. What we can observe, that after a k8s-cluster re-installation the issue might come more frequently for some days.
- Deployment to trigger the issue '....'
- See error
Actual results
PVC cannot be mounted
Expected behavior
PVC can be created without described issue.
Logs
In eric-eea-ns namespace the pod eric-eea-refdata-data-document-database-pg-1 cannot be started
File: logs_eric-eea-ns_2025-03-14-01-12-26.tgz/describe/PODS/pods.txt
eric-eea-refdata-data-document-database-pg-1 0/3 ContainerCreating 0 43m <none> seliics07842e01 <none> <none>
When describing the pod (file: logs_eric-eea-ns_2025-03-14-01-12-26.tgz/describe/PODS/eric-eea-refdata-data-document-database-pg-1.yaml)the following is observed:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 44m default-scheduler 0/4 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.
Normal Scheduled 44m default-scheduler Successfully assigned eric-eea-ns/eric-eea-refdata-data-document-database-pg-1 to seliics07842e01
Normal SuccessfulAttachVolume 44m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-cc5127e5-2598-41a9-a742-f51290f28b08"
Warning FailedMount 108s (x28 over 43m) kubelet MountVolume.MountDevice failed for volume "pvc-cc5127e5-2598-41a9-a742-f51290f28b08" : rpc error: code = Internal desc = mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t ext4 -o _netdev,defaults /dev/mapper/luks-rbd-0001-0009-rook-ceph-0000000000000001-b7c610dc-61bc-4136-b03d-ae5c86fef99b /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/2677416184d1804456c8cda2e754b18d3359f0d524484b48ec7f534aba3fe540/globalmount/0001-0009-rook-ceph-0000000000000001-b7c610dc-61bc-4136-b03d-ae5c86fef99b
Output: mount: /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/2677416184d1804456c8cda2e754b18d3359f0d524484b48ec7f534aba3fe540/globalmount/0001-0009-rook-ceph-0000000000000001-b7c610dc-61bc-4136-b03d-ae5c86fef99b: wrong fs type, bad option, bad superblock on /dev/mapper/luks-rbd-0001-0009-rook-ceph-0000000000000001-b7c610dc-61bc-4136-b03d-ae5c86fef99b, missing codepage or helper program, or other error.
Related PVC is pvc-cc5127e5-2598-41a9-a742-f51290f28b08
In the rook-ceph namespace the following error is visible continously: File: logs_rook-ceph_2025-03-14-01-26-46/logs/err/csi-rbdplugin-qtscn_csi-rbdplugin.err.txt
I0314 00:28:52.535743 2028 mount_linux.go:452] `fsck` error fsck from util-linux 2.37.4
fsck: error 2 (No such file or directory) while executing fsck.ext4dev for /dev/mapper/luks-rbd-0001-0009-rook-ceph-0000000000000001-b7c610dc-61bc-4136-b03d-ae5c86fef99b
Output: mount: /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/2677416184d1804456c8cda2e754b18d3359f0d524484b48ec7f534aba3fe540/globalmount/0001-0009-rook-ceph-0000000000000001-b7c610dc-61bc-4136-b03d-ae5c86fef99b: wrong fs type, bad option, bad superblock on /dev/mapper/luks-rbd-0001-0009-rook-ceph-0000000000000001-b7c610dc-61bc-4136-b03d-ae5c86fef99b, missing codepage or helper program, or other error.
E0314 00:28:52.540850 2028 nodeserver.go:842] ID: 290500 Req-ID: 0001-0009-rook-ceph-0000000000000001-b7c610dc-61bc-4136-b03d-ae5c86fef99b failed to mount device path (/dev/mapper/luks-rbd-0001-0009-rook-ceph-0000000000000001-b7c610dc-61bc-4136-b03d-ae5c86fef99b) to staging path (/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/2677416184d1804456c8cda2e754b18d3359f0d524484b48ec7f534aba3fe540/globalmount/0001-0009-rook-ceph-0000000000000001-b7c610dc-61bc-4136-b03d-ae5c86fef99b) for volume (0001-0009-rook-ceph-0000000000000001-b7c610dc-61bc-4136-b03d-ae5c86fef99b) error: mount failed: exit status 32
Output: mount: /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/2677416184d1804456c8cda2e754b18d3359f0d524484b48ec7f534aba3fe540/globalmount/0001-0009-rook-ceph-0000000000000001-b7c610dc-61bc-4136-b03d-ae5c86fef99b: wrong fs type, bad option, bad superblock on /dev/mapper/luks-rbd-0001-0009-rook-ceph-0000000000000001-b7c610dc-61bc-4136-b03d-ae5c86fef99b, missing codepage or helper program, or other error.
E0314 00:28:52.738512 2028 utils.go:271] ID: 290500 Req-ID: 0001-0009-rook-ceph-0000000000000001-b7c610dc-61bc-4136-b03d-ae5c86fef99b GRPC error: rpc error: code = Internal desc = mount failed: exit status 32
Output: mount: /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/2677416184d1804456c8cda2e754b18d3359f0d524484b48ec7f534aba3fe540/globalmount/0001-0009-rook-ceph-0000000000000001-b7c610dc-61bc-4136-b03d-ae5c86fef99b: wrong fs type, bad option, bad superblock on /dev/mapper/luks-rbd-0001-0009-rook-ceph-0000000000000001-b7c610dc-61bc-4136-b03d-ae5c86fef99b, missing codepage or helper program, or other error.
I0314 00:28:55.895428 2028 mount_linux.go:452] `fsck` error fsck from util-linux 2.37.4
in the /var/log/messages file the following is observable:
2025-03-14T01:28:52.739124+01:00 seliics07842e01 kubelet[29074]: E0314 01:28:52.739039 29074 csi_attacher.go:366] kubernetes.io/csi: attacher.MountDevice failed: rpc error: code = Internal desc = mount failed: exit status 32
2025-03-14T01:28:52.739262+01:00 seliics07842e01 kubelet[29074]: Mounting command: mount
2025-03-14T01:28:52.739321+01:00 seliics07842e01 kubelet[29074]: Mounting arguments: -t ext4 -o _netdev,defaults /dev/mapper/luks-rbd-0001-0009-rook-ceph-0000000000000001-b7c610dc-61bc-4136-b03d-ae5c86fef99b /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/2677416184d1804456c8cda2e754b18d3359f0d524484b48ec7f534aba3fe540/globalmount/0001-0009-rook-ceph-0000000000000001-b7c610dc-61bc-4136-b03d-ae5c86fef99b
2025-03-14T01:28:52.739366+01:00 seliics07842e01 kubelet[29074]: Output: mount: /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/2677416184d1804456c8cda2e754b18d3359f0d524484b48ec7f534aba3fe540/globalmount/0001-0009-rook-ceph-0000000000000001-b7c610dc-61bc-4136-b03d-ae5c86fef99b: wrong fs type, bad option, bad superblock on /dev/mapper/luks-rbd-0001-0009-rook-ceph-0000000000000001-b7c610dc-61bc-4136-b03d-ae5c86fef99b, missing codepage or helper program, or other error.
2025-03-14T01:28:52.739420+01:00 seliics07842e01 kubelet[29074]: E0314 01:28:52.739262 29074 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-b7c610dc-61bc-4136-b03d-ae5c86fef99b podName: nodeName:}" failed. No retries permitted until 2025-03-14 01:28:53.239241093 +0100 CET m=+1239617.995228529 (durationBeforeRetry 500ms). Error: MountVolume.MountDevice failed for volume "pvc-cc5127e5-2598-41a9-a742-f51290f28b08" (UniqueName: "kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-b7c610dc-61bc-4136-b03d-ae5c86fef99b") pod "eric-eea-refdata-data-document-database-pg-1" (UID: "a5774eb8-a4d2-4708-92c7-3d092b1580cb") : rpc error: code = Internal desc = mount failed: exit status 32
2025-03-14T01:28:52.739506+01:00 seliics07842e01 kubelet[29074]: Mounting command: mount
2025-03-14T01:28:52.739539+01:00 seliics07842e01 kubelet[29074]: Mounting arguments: -t ext4 -o _netdev,defaults /dev/mapper/luks-rbd-0001-0009-rook-ceph-0000000000000001-b7c610dc-61bc-4136-b03d-ae5c86fef99b /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/2677416184d1804456c8cda2e754b18d3359f0d524484b48ec7f534aba3fe540/globalmount/0001-0009-rook-ceph-0000000000000001-b7c610dc-61bc-4136-b03d-ae5c86fef99b
2025-03-14T01:28:52.739574+01:00 seliics07842e01 kubelet[29074]: Output: mount: /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/2677416184d1804456c8cda2e754b18d3359f0d524484b48ec7f534aba3fe540/globalmount/0001-0009-rook-ceph-0000000000000001-b7c610dc-61bc-4136-b03d-ae5c86fef99b: wrong fs type, bad option, bad superblock on /dev/mapper/luks-rbd-0001-0009-rook-ceph-0000000000000001-b7c610dc-61bc-4136-b03d-ae5c86fef99b, missing codepage or helper program, or other error.
2025-03-14T01:28:52.749315+01:00 seliics07842e01 systemd[1]: cri-containerd-5f9ca82db63019c5d49dd0af0a95f5af5c4ddb47007cbddd691949f6981e7181.scope: Deactivated successfully.
2025-03-14T01:28:52.816663+01:00 seliics07842e01 systemd[1]: run-containerd-io.containerd.runtime.v2.task-k8s.io-5f9ca82db63019c5d49dd0af0a95f5af5c4ddb47007cbddd691949f6981e7181-rootfs.mount: Deactivated successfully.
2025-03-14T01:28:53.005385+01:00 seliics07842e01 systemd[1]: Started libcontainer container 3f0d37beb1e8fc21d22a93a9986ff5f6cba2204f846a74610c228a15adf461c8.
2025-03-14T01:28:53.312877+01:00 seliics07842e01 kubelet[29074]: I0314 01:28:53.312308 29074 operation_generator.go:538] "MountVolume.WaitForAttach entering for volume \"pvc-cc5127e5-2598-41a9-a742-f51290f28b08\" (UniqueName: \"kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-b7c610dc-61bc-4136-b03d-ae5c86fef99b\") pod \"eric-eea-refdata-data-document-database-pg-1\" (UID: \"a5774eb8-a4d2-4708-92c7-3d092b1580cb\") DevicePath \"\"" pod="eric-eea-ns/eric-eea-refdata-data-document-database-pg-1"
2025-03-14T01:28:53.314864+01:00 seliics07842e01 (udev-worker)[27815]: dm-23: Failed to create/update device symlink '/dev/mapper/luks-rbd-0001-0009-rook-ceph-0000000000000001-10ab64ad-094a-41f6-b65f-b9fa2b7848cb', ignoring: File exists
2025-03-14T01:28:53.315613+01:00 seliics07842e01 kubelet[29074]: I0314 01:28:53.315489 29074 operation_generator.go:548] "MountVolume.WaitForAttach succeeded for volume \"pvc-cc5127e5-2598-41a9-a742-f51290f28b08\" (UniqueName: \"kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-b7c610dc-61bc-4136-b03d-ae5c86fef99b\") pod \"eric-eea-refdata-data-document-database-pg-1\" (UID: \"a5774eb8-a4d2-4708-92c7-3d092b1580cb\") DevicePath \"csi-d60a1d1fb2afebbee9fc3f3ea1f324638a6c760b88bff7ac604f4c2cb77635df\"" pod="eric-eea-ns/eric-eea-refdata-data-document-database-pg-1"
log files are attached:
Additional context
Error messages and sympthon is the same that is reported in https://github.com/ceph/ceph-csi/issues/3913
Info regarding the setup and attached log files:
-
rook-ceph has its dedicated namespace, every log from that namespace is collected in the attahced file logs_rook-ceph_2025-03-14-01-26-46.tgz
-
kube-system namespace logs are in file logs_kube-system_2025-03-14-01-26-22.tgz
-
The namespace which contains the product that is tested is deployed in namespace eric-eea-ns, logs are collected to : logs_eric-eea-ns_2025-03-14-01-12-26.tgz
-
var/log/messages and dmesg logs are attached in seliics07842e01.zip
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
issue is valid, please investigate
wrong fs type, bad option, bad superblock on /dev/.., missing codepage or helper program, or other error. is an error from mount.ext4 that is commonly reported when a volume was in use while a node rebooted. This is not limited to encrypted volumes, it can happen with unencrypted volumes as well.
Is there something in your testing that reboots nodes, without draining them from running pods? A filesystem like ext4 can get corrupt (you'll get the above error when mounting), and depending on the corruption, fsck may (not) be able to fix it.
no reboot happened during the error. The scenario is : an integration helm chart is deployed in k8s cluster , multiple application pods are deployed and they request PVCs, but there are cases when one is not getting PVC due to the reported issue. Others gets their PVC. After repeating the same installation on the same k8s cluster, it just works fine. Issue comes randomly. Node is not rebooted, All the logs attached.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
issue is valid, please investigate
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
issue is valid, please investigate
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
issue is valid, please investigate
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
issue is valid, please investigate
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.
Issue is not seen after uplifting to rook v1.17.7 with ceph 19.2.2. Also modified dimensioning of components