ceph-csi icon indicating copy to clipboard operation
ceph-csi copied to clipboard

ceph-csi-cephfs 3.7.0 no longer attaches the volume to the pod

Open thrashwerk opened this issue 2 years ago • 12 comments

Describe the bug

After updating the ceph-csi-cephfs Helm chart to 3.7.0 the CephFS PVCs no longer get mounted to the pod and the pod stays undeployed.

We have static CephFS filesystems already created in Ceph, we have manually created PVs and PVCs for these filesystems and we use these PVCs to mount the CephFS in pods.

For example:

    volumeMounts:
      - name: ceph-pvc-smaug
        mountPath: /mnt
  volumes:
    - name: ceph-pvc-smaug
      persistentVolumeClaim:
        claimName: cephfs-read-smaug

After creating a pod with those mounts it just gets stuck in ContainerCreating state with an event Warning FailedMount 101s kubelet Unable to attach or mount volumes: unmounted volumes=[ceph-pvc-smaug], unattached volumes=[cephfs-pvc-storage1 kube-api-access-2q26t]: timed out waiting for the condition.

Everything was working on 3.6.2.

Environment details

  • Image/version of Ceph CSI driver : 3.7.0
  • Helm chart version : 3.7.0
  • Kernel version : 5.15.0
  • Mounter used for mounting PVC (for cephFS its fuse or kernel. for rbd its krbd or rbd-nbd) : kernel (I guess)
  • Kubernetes cluster version : 1.24
  • Ceph cluster version : 16.2.9

Steps to reproduce

Steps to reproduce the behavior:

  1. Update ceph-csi-cephfs Helm chart to 3.7.0
  2. Create a pod with volume backed by CephFS PVC

Actual results

PVC not mounted.

Expected behavior

Pod gets created with the mounted PVC.

Logs

If the issue is in PVC mounting please attach complete logs of below containers.

  • csi-cephfsplugin logs:
I0914 07:50:20.199977 2872353 cephcsi.go:192] Driver version: v3.7.0 and Git version: 34fd27bbd1b9e0efff48805d025a1069b5dbbc53
I0914 07:50:20.200163 2872353 cephcsi.go:210] Initial PID limit is set to 154277
I0914 07:50:20.200189 2872353 cephcsi.go:216] Reconfigured PID limit to -1 (max)
I0914 07:50:20.200207 2872353 cephcsi.go:241] Starting driver type: cephfs with name: cephfs.csi.ceph.com
I0914 07:50:20.210635 2872353 volumemounter.go:79] loaded mounter: kernel
I0914 07:50:20.223808 2872353 volumemounter.go:90] loaded mounter: fuse
I0914 07:50:20.225606 2872353 mount_linux.go:283] Detected umount with safe 'not mounted' behavior
I0914 07:50:20.225841 2872353 server.go:126] Listening for connections on address: &net.UnixAddr{Name:"//csi/csi.sock", Net:"unix"}
I0914 07:50:20.991741 2872353 utils.go:195] ID: 1 GRPC call: /csi.v1.Identity/GetPluginInfo
I0914 07:50:20.993750 2872353 utils.go:199] ID: 1 GRPC request: {}
I0914 07:50:20.993768 2872353 identityserver-default.go:39] ID: 1 Using default GetPluginInfo
I0914 07:50:20.993840 2872353 utils.go:206] ID: 1 GRPC response: {"name":"cephfs.csi.ceph.com","vendor_version":"v3.7.0"}
I0914 07:50:21.042063 2872353 utils.go:195] ID: 2 GRPC call: /csi.v1.Node/NodeGetInfo
I0914 07:50:21.042118 2872353 utils.go:199] ID: 2 GRPC request: {}
I0914 07:50:21.042128 2872353 nodeserver-default.go:51] ID: 2 Using default NodeGetInfo
I0914 07:50:21.042250 2872353 utils.go:206] ID: 2 GRPC response: {"accessible_topology":{},"node_id":"scrap-k8s-node074"}
I0914 07:51:20.396249 2872353 utils.go:195] ID: 3 GRPC call: /csi.v1.Identity/Probe
I0914 07:51:20.396313 2872353 utils.go:199] ID: 3 GRPC request: {}
I0914 07:51:20.396430 2872353 utils.go:206] ID: 3 GRPC response: {}
I0914 07:52:20.368826 2872353 utils.go:195] ID: 4 GRPC call: /csi.v1.Identity/Probe
I0914 07:52:20.368952 2872353 utils.go:199] ID: 4 GRPC request: {}
I0914 07:52:20.368995 2872353 utils.go:206] ID: 4 GRPC response: {}
I0914 07:53:20.421603 2872353 utils.go:195] ID: 5 GRPC call: /csi.v1.Identity/Probe
I0914 07:53:20.421651 2872353 utils.go:199] ID: 5 GRPC request: {}
I0914 07:53:20.421675 2872353 utils.go:206] ID: 5 GRPC response: {}
I0914 07:54:20.378598 2872353 utils.go:195] ID: 6 GRPC call: /csi.v1.Identity/Probe
I0914 07:54:20.378704 2872353 utils.go:199] ID: 6 GRPC request: {}
I0914 07:54:20.378737 2872353 utils.go:206] ID: 6 GRPC response: {}
I0914 07:54:20.410779 2872353 utils.go:195] ID: 7 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0914 07:54:20.410840 2872353 utils.go:199] ID: 7 GRPC request: {}
I0914 07:54:20.411054 2872353 utils.go:206] ID: 7 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}}]}
I0914 07:54:20.416108 2872353 utils.go:195] ID: 8 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0914 07:54:20.416165 2872353 utils.go:199] ID: 8 GRPC request: {}
I0914 07:54:20.416284 2872353 utils.go:206] ID: 8 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}}]}
I0914 07:54:20.417080 2872353 utils.go:195] ID: 9 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0914 07:54:20.417176 2872353 utils.go:199] ID: 9 GRPC request: {}
I0914 07:54:20.417294 2872353 utils.go:206] ID: 9 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}}]}
I0914 07:54:20.418622 2872353 utils.go:195] ID: 10 Req-ID: data-platform-cephfs-read-smaug GRPC call: /csi.v1.Node/NodeStageVolume
I0914 07:54:20.418891 2872353 utils.go:199] ID: 10 Req-ID: data-platform-cephfs-read-smaug GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.csi.ceph.com/cdc6743d9f28807918659cd1a98dcd8722c76e1e504072c73d109c5ecec66fc7/globalmount","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":3}},"volume_context":{"clusterID":"96df99f6-fc1a-11ea-90a4-6cb3113cb732","fsName":"storage","kernelMountOptions":"recover_session=clean,ro","rootPath":"/smaug","staticVolume":"true"},"volume_id":"data-platform-cephfs-read-smaug"}
I0914 07:54:20.419175 2872353 volumemounter.go:126] requested mounter: , chosen mounter: kernel
I0914 07:54:20.419238 2872353 nodeserver.go:187] ID: 10 Req-ID: data-platform-cephfs-read-smaug cephfs: volume data-platform-cephfs-read-smaug is already mounted to /var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.csi.ceph.com/cdc6743d9f28807918659cd1a98dcd8722c76e1e504072c73d109c5ecec66fc7/globalmount, skipping
I0914 07:54:20.419271 2872353 utils.go:206] ID: 10 Req-ID: data-platform-cephfs-read-smaug GRPC response: {}
I0914 07:54:20.420168 2872353 utils.go:195] ID: 11 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0914 07:54:20.420220 2872353 utils.go:199] ID: 11 GRPC request: {}
I0914 07:54:20.420343 2872353 utils.go:206] ID: 11 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}}]}
I0914 07:54:20.421056 2872353 utils.go:195] ID: 12 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0914 07:54:20.421105 2872353 utils.go:199] ID: 12 GRPC request: {}
I0914 07:54:20.421252 2872353 utils.go:206] ID: 12 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}}]}
I0914 07:54:20.422052 2872353 utils.go:195] ID: 13 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0914 07:54:20.422100 2872353 utils.go:199] ID: 13 GRPC request: {}
I0914 07:54:20.422216 2872353 utils.go:206] ID: 13 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}}]}
I0914 07:54:20.423125 2872353 utils.go:195] ID: 14 Req-ID: data-platform-cephfs-read-smaug GRPC call: /csi.v1.Node/NodePublishVolume
I0914 07:54:20.423298 2872353 utils.go:199] ID: 14 Req-ID: data-platform-cephfs-read-smaug GRPC request: {"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.csi.ceph.com/cdc6743d9f28807918659cd1a98dcd8722c76e1e504072c73d109c5ecec66fc7/globalmount","target_path":"/var/lib/kubelet/pods/df67067e-0b04-4630-aba2-8ab3995c775d/volumes/kubernetes.io~csi/data-platform-cephfs-read-smaug/mount","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":3}},"volume_context":{"clusterID":"96df99f6-fc1a-11ea-90a4-6cb3113cb732","fsName":"storage","kernelMountOptions":"recover_session=clean,ro","rootPath":"/smaug","staticVolume":"true"},"volume_id":"data-platform-cephfs-read-smaug"}
I0914 07:54:20.426100 2872353 cephcmds.go:105] ID: 14 Req-ID: data-platform-cephfs-read-smaug command succeeded: mount [-o bind,_netdev /var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.csi.ceph.com/cdc6743d9f28807918659cd1a98dcd8722c76e1e504072c73d109c5ecec66fc7/globalmount /var/lib/kubelet/pods/df67067e-0b04-4630-aba2-8ab3995c775d/volumes/kubernetes.io~csi/data-platform-cephfs-read-smaug/mount]
I0914 07:54:20.426127 2872353 nodeserver.go:467] ID: 14 Req-ID: data-platform-cephfs-read-smaug cephfs: successfully bind-mounted volume data-platform-cephfs-read-smaug to /var/lib/kubelet/pods/df67067e-0b04-4630-aba2-8ab3995c775d/volumes/kubernetes.io~csi/data-platform-cephfs-read-smaug/mount
I0914 07:54:20.426175 2872353 utils.go:206] ID: 14 Req-ID: data-platform-cephfs-read-smaug GRPC response: {}
I0914 07:55:13.663939 2872353 utils.go:195] ID: 15 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0914 07:55:13.664007 2872353 utils.go:199] ID: 15 GRPC request: {}
I0914 07:55:13.664119 2872353 utils.go:206] ID: 15 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}}]}
I0914 07:55:13.814538 2872353 utils.go:195] ID: 16 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0914 07:55:13.814586 2872353 utils.go:199] ID: 16 GRPC request: {}
I0914 07:55:13.814681 2872353 utils.go:206] ID: 16 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}}]}
I0914 07:55:13.815583 2872353 utils.go:195] ID: 17 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0914 07:55:13.815650 2872353 utils.go:199] ID: 17 GRPC request: {}
I0914 07:55:13.815768 2872353 utils.go:206] ID: 17 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}}]}
I0914 07:55:13.816558 2872353 utils.go:195] ID: 18 Req-ID: parsing-cephfs-write-workdir GRPC call: /csi.v1.Node/NodeStageVolume
I0914 07:55:13.816687 2872353 utils.go:199] ID: 18 Req-ID: parsing-cephfs-write-workdir GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.csi.ceph.com/001e385c45ece6512ca7f68d86b0aac955f36c504fd1ad501397e610d0046a31/globalmount","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":5}},"volume_context":{"clusterID":"96df99f6-fc1a-11ea-90a4-6cb3113cb732","fsName":"storage","kernelMountOptions":"recover_session=clean","rootPath":"/workdir","staticVolume":"true"},"volume_id":"parsing-cephfs-write-workdir"}
I0914 07:55:13.816869 2872353 volumemounter.go:126] requested mounter: , chosen mounter: kernel
I0914 07:55:13.816928 2872353 nodeserver.go:187] ID: 18 Req-ID: parsing-cephfs-write-workdir cephfs: volume parsing-cephfs-write-workdir is already mounted to /var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.csi.ceph.com/001e385c45ece6512ca7f68d86b0aac955f36c504fd1ad501397e610d0046a31/globalmount, skipping
I0914 07:55:13.816954 2872353 utils.go:206] ID: 18 Req-ID: parsing-cephfs-write-workdir GRPC response: {}
I0914 07:55:13.817718 2872353 utils.go:195] ID: 19 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0914 07:55:13.817789 2872353 utils.go:199] ID: 19 GRPC request: {}
I0914 07:55:13.817915 2872353 utils.go:206] ID: 19 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}}]}
I0914 07:55:13.818790 2872353 utils.go:195] ID: 20 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0914 07:55:13.818832 2872353 utils.go:199] ID: 20 GRPC request: {}
I0914 07:55:13.818927 2872353 utils.go:206] ID: 20 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}}]}
I0914 07:55:13.819680 2872353 utils.go:195] ID: 21 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0914 07:55:13.819720 2872353 utils.go:199] ID: 21 GRPC request: {}
I0914 07:55:13.819808 2872353 utils.go:206] ID: 21 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}}]}
I0914 07:55:13.820509 2872353 utils.go:195] ID: 22 Req-ID: parsing-cephfs-write-workdir GRPC call: /csi.v1.Node/NodePublishVolume
I0914 07:55:13.820633 2872353 utils.go:199] ID: 22 Req-ID: parsing-cephfs-write-workdir GRPC request: {"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.csi.ceph.com/001e385c45ece6512ca7f68d86b0aac955f36c504fd1ad501397e610d0046a31/globalmount","target_path":"/var/lib/kubelet/pods/0b09d9e7-fb01-423e-92be-8660c4367c42/volumes/kubernetes.io~csi/parsing-cephfs-write-workdir/mount","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":5}},"volume_context":{"clusterID":"96df99f6-fc1a-11ea-90a4-6cb3113cb732","fsName":"storage","kernelMountOptions":"recover_session=clean","rootPath":"/workdir","staticVolume":"true"},"volume_id":"parsing-cephfs-write-workdir"}
I0914 07:55:13.823710 2872353 cephcmds.go:105] ID: 22 Req-ID: parsing-cephfs-write-workdir command succeeded: mount [-o bind,_netdev /var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.csi.ceph.com/001e385c45ece6512ca7f68d86b0aac955f36c504fd1ad501397e610d0046a31/globalmount /var/lib/kubelet/pods/0b09d9e7-fb01-423e-92be-8660c4367c42/volumes/kubernetes.io~csi/parsing-cephfs-write-workdir/mount]
I0914 07:55:13.823754 2872353 nodeserver.go:467] ID: 22 Req-ID: parsing-cephfs-write-workdir cephfs: successfully bind-mounted volume parsing-cephfs-write-workdir to /var/lib/kubelet/pods/0b09d9e7-fb01-423e-92be-8660c4367c42/volumes/kubernetes.io~csi/parsing-cephfs-write-workdir/mount
I0914 07:55:13.823782 2872353 utils.go:206] ID: 22 Req-ID: parsing-cephfs-write-workdir GRPC response: {}
I0914 07:55:20.422243 2872353 utils.go:195] ID: 23 GRPC call: /csi.v1.Identity/Probe
I0914 07:55:20.422324 2872353 utils.go:199] ID: 23 GRPC request: {}
I0914 07:55:20.422347 2872353 utils.go:206] ID: 23 GRPC response: {}
I0914 07:56:07.685383 2872353 utils.go:195] ID: 24 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0914 07:56:07.685453 2872353 utils.go:199] ID: 24 GRPC request: {}
I0914 07:56:07.685587 2872353 utils.go:206] ID: 24 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}}]}
I0914 07:56:07.686780 2872353 utils.go:195] ID: 25 GRPC call: /csi.v1.Node/NodeGetVolumeStats
I0914 07:56:07.686913 2872353 utils.go:199] ID: 25 GRPC request: {"volume_id":"parsing-cephfs-write-workdir","volume_path":"/var/lib/kubelet/pods/0b09d9e7-fb01-423e-92be-8660c4367c42/volumes/kubernetes.io~csi/parsing-cephfs-write-workdir/mount"}
I0914 07:56:07.688022 2872353 utils.go:206] ID: 25 GRPC response: {"usage":[{"available":147697734516736,"total":513936440950784,"unit":1,"used":366238706434048},{"total":337453544,"unit":2,"used":337453545}]}
I0914 07:56:20.388033 2872353 utils.go:195] ID: 26 GRPC call: /csi.v1.Identity/Probe
I0914 07:56:20.388352 2872353 utils.go:199] ID: 26 GRPC request: {}
I0914 07:56:20.388569 2872353 utils.go:206] ID: 26 GRPC response: {}
I0914 07:57:20.422717 2872353 utils.go:195] ID: 27 GRPC call: /csi.v1.Identity/Probe
I0914 07:57:20.422767 2872353 utils.go:199] ID: 27 GRPC request: {}
I0914 07:57:20.422789 2872353 utils.go:206] ID: 27 GRPC response: {}
I0914 07:58:05.092902 2872353 utils.go:195] ID: 28 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0914 07:58:05.092981 2872353 utils.go:199] ID: 28 GRPC request: {}
I0914 07:58:05.093108 2872353 utils.go:206] ID: 28 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}}]}
I0914 07:58:05.094114 2872353 utils.go:195] ID: 29 GRPC call: /csi.v1.Node/NodeGetVolumeStats
I0914 07:58:05.094174 2872353 utils.go:199] ID: 29 GRPC request: {"volume_id":"parsing-cephfs-write-workdir","volume_path":"/var/lib/kubelet/pods/0b09d9e7-fb01-423e-92be-8660c4367c42/volumes/kubernetes.io~csi/parsing-cephfs-write-workdir/mount"}
I0914 07:58:05.094918 2872353 utils.go:206] ID: 29 GRPC response: {"usage":[{"available":147693942865920,"total":513932653494272,"unit":1,"used":366238710628352},{"total":337453550,"unit":2,"used":337453551}]}
I0914 07:58:20.397671 2872353 utils.go:195] ID: 30 GRPC call: /csi.v1.Identity/Probe
I0914 07:58:20.397792 2872353 utils.go:199] ID: 30 GRPC request: {}
I0914 07:58:20.397918 2872353 utils.go:206] ID: 30 GRPC response: {}
  • driver-registrar logs:
I0914 07:50:19.981828 2872319 main.go:166] Version: v2.5.1
I0914 07:50:19.981874 2872319 main.go:167] Running node-driver-registrar in mode=registration
I0914 07:50:19.984157 2872319 main.go:191] Attempting to open a gRPC connection with: "/csi/csi.sock"
I0914 07:50:19.984181 2872319 connection.go:154] Connecting to unix:///csi/csi.sock
I0914 07:50:20.985220 2872319 main.go:198] Calling CSI driver to discover driver name
I0914 07:50:20.985254 2872319 connection.go:183] GRPC call: /csi.v1.Identity/GetPluginInfo
I0914 07:50:20.985261 2872319 connection.go:184] GRPC request: {}
I0914 07:50:20.994617 2872319 connection.go:186] GRPC response: {"name":"cephfs.csi.ceph.com","vendor_version":"v3.7.0"}
I0914 07:50:20.994731 2872319 connection.go:187] GRPC error: <nil>
I0914 07:50:20.994742 2872319 main.go:208] CSI driver name: "cephfs.csi.ceph.com"
I0914 07:50:20.994781 2872319 node_register.go:53] Starting Registration Server at: /registration/cephfs.csi.ceph.com-reg.sock
I0914 07:50:20.995049 2872319 node_register.go:62] Registration Server started at: /registration/cephfs.csi.ceph.com-reg.sock
I0914 07:50:20.995205 2872319 node_register.go:92] Skipping HTTP server because endpoint is set to: ""
I0914 07:50:21.040494 2872319 main.go:102] Received GetInfo call: &InfoRequest{}
I0914 07:50:21.040948 2872319 main.go:109] "Kubelet registration probe created" path="/var/lib/kubelet/plugins/cephfs.csi.ceph.com/registration"
I0914 07:50:21.067754 2872319 main.go:120] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}

Additional context

#3149 could this change possibly be related?

thrashwerk avatar Sep 14 '22 08:09 thrashwerk

@thrashwerk , The logs don't contain information about the pvc cephfs-read-smaug you mentioned, did you check whether that pvc got to bound state ?

I see a few other cephfs PVC working fine though.

Rakshith-R avatar Sep 14 '22 10:09 Rakshith-R

It's been bound for over a 100 days, it's not a new PVC. We create them manually once and re-use them.

thrashwerk avatar Sep 14 '22 10:09 thrashwerk

Maybe related to this issue: https://github.com/ceph/ceph-csi/issues/3309

Informize avatar Sep 15 '22 12:09 Informize

I can also confirm this issue — static cephfs PVs I've created by hand and working for the past few months broke following 3.7.0 update.

geraldwuhoo avatar Sep 16 '22 23:09 geraldwuhoo

@geraldwuhoo Can you please provide pvc/pv and app yaml files?

Madhu-1 avatar Sep 19 '22 05:09 Madhu-1

Here's an example of my configuration.

PV:

apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/bound-by-controller: "yes"
  creationTimestamp: "2022-05-31T13:40:40Z"
  name: data-platform-cephfs-read-smaug
spec:
  accessModes:
  - ReadOnlyMany
  capacity:
    storage: 1Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: cephfs-read-smaug
    namespace: data-platform
  csi:
    driver: cephfs.csi.ceph.com
    nodeStageSecretRef:
      name: cephfs-client-storage-read-smaug-creds
      namespace: ceph-csi-cephfs
    volumeAttributes:
      clusterID: 96df99f6
      fsName: storage
      kernelMountOptions: recover_session=clean,ro
      rootPath: /smaug
      staticVolume: "true"
    volumeHandle: data-platform-cephfs-read-smaug
  persistentVolumeReclaimPolicy: Retain
  volumeMode: Filesystem
status:
  phase: Bound

PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
  creationTimestamp: "2022-05-31T13:40:40Z"
  name: cephfs-read-smaug
  namespace: data-platform
spec:
  accessModes:
  - ReadOnlyMany
  resources:
    requests:
      storage: 1Gi
  storageClassName: ""
  volumeMode: Filesystem
  volumeName: data-platform-cephfs-read-smaug
status:
  accessModes:
  - ReadOnlyMany
  capacity:
    storage: 1Gi
  phase: Bound

thrashwerk avatar Sep 21 '22 13:09 thrashwerk

@thrashwerk Thanks for the PVC/PV spec can you please also provide pod.yaml?

Madhu-1 avatar Sep 22 '22 04:09 Madhu-1

This is the one I used to test it specifically:

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: yes
  name: pvc-test
  namespace: data-platform
spec:
  containers:
  - command:
    - tail
    - -f
    - /dev/null
    image: opensuse/leap
    name: opensuse
    resources:
      limits:
        cpu: "1"
        memory: 1Gi
      requests:
        cpu: "1"
        memory: 1Gi
    volumeMounts:
      - name: ceph-pvc-smaug
        mountPath: /mnt
  volumes:
    - name: ceph-pvc-smaug
      persistentVolumeClaim:
        claimName: cephfs-read-smaug

thrashwerk avatar Sep 22 '22 12:09 thrashwerk

I've tried updating to 3.7.1 with not much luck but I did dig a bit more and I found some interesting things.

  • Updated to 3.7.1
  • Created the same pod using the same old PVC
  • Checked mounts on the host where the pod was scheduled
REDACTED:/smaug on /var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.csi.ceph.com/cdc6743d9f28807918659cd1a98dcd8722c76e1e504072c73d109c5ecec66fc7/globalmount type ceph (ro,relatime,name=storage.read-smaug,secret=<hidden>,fsid=00000000-0000-0000-0000-000000000000,acl,mds_namespace=storage,recover_session=clean,_netdev)
REDACTED:/smaug on /var/lib/kubelet/pods/65c2f352-0521-463e-aebd-bb1e7f14ce9c/volumes/kubernetes.io~csi/data-platform-cephfs-read-smaug/mount type ceph (ro,relatime,name=storage.read-smaug,secret=<hidden>,fsid=00000000-0000-0000-0000-000000000000,acl,mds_namespace=storage,recover_session=clean,_netdev)
  • The mounts work correctly from the host

So it seems ceph-csi creates the mounts correctly but doesn't communicate back to K8s that it's ready and the pod can now be created?

Some logs from the nodeplugin pod csi-cephfsplugin container

I1004 11:09:12.766490  392133 utils.go:195] ID: 7 GRPC call: /csi.v1.Node/NodeGetCapabilities
I1004 11:09:12.766553  392133 utils.go:206] ID: 7 GRPC request: {}
I1004 11:09:12.766685  392133 utils.go:212] ID: 7 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}}]}
I1004 11:09:12.771501  392133 utils.go:195] ID: 8 GRPC call: /csi.v1.Node/NodeGetCapabilities
I1004 11:09:12.771566  392133 utils.go:206] ID: 8 GRPC request: {}
I1004 11:09:12.771703  392133 utils.go:212] ID: 8 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}}]}
I1004 11:09:12.772485  392133 utils.go:195] ID: 9 GRPC call: /csi.v1.Node/NodeGetCapabilities
I1004 11:09:12.772532  392133 utils.go:206] ID: 9 GRPC request: {}
I1004 11:09:12.772624  392133 utils.go:212] ID: 9 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}}]}
I1004 11:09:12.773559  392133 utils.go:195] ID: 10 Req-ID: data-platform-cephfs-read-smaug GRPC call: /csi.v1.Node/NodeStageVolume
I1004 11:09:12.773813  392133 utils.go:206] ID: 10 Req-ID: data-platform-cephfs-read-smaug GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.csi.ceph.com/cdc6743d9f28807918659cd1a98dcd8722c76e1e504072c73d109c5ecec66fc7/globalmount","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":3}},"volume_context":{"clusterID":"96df99f6-fc1a-11ea-90a4-6cb3113cb732","fsName":"storage","kernelMountOptions":"recover_session=clean,ro","rootPath":"/smaug","staticVolume":"true"},"volume_id":"data-platform-cephfs-read-smaug"}
I1004 11:09:12.774018  392133 volumemounter.go:126] requested mounter: , chosen mounter: kernel
I1004 11:09:12.774091  392133 nodeserver.go:247] ID: 10 Req-ID: data-platform-cephfs-read-smaug cephfs: mounting volume data-platform-cephfs-read-smaug with Ceph kernel client
I1004 11:09:12.776395  392133 cephcmds.go:105] ID: 10 Req-ID: data-platform-cephfs-read-smaug command succeeded: modprobe [ceph]
I1004 11:09:12.961296  392133 cephcmds.go:105] ID: 10 Req-ID: data-platform-cephfs-read-smaug command succeeded: mount [-t ceph REDACTED:/smaug /var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.csi.ceph.com/cdc6743d9f28807918659cd1a98dcd8722c76e1e504072c73d109c5ecec66fc7/globalmount -o name=storage.read-smaug,secretfile=/tmp/csi/keys/keyfile-1721295337,mds_namespace=storage,recover_session=clean,ro,_netdev]
I1004 11:09:12.961353  392133 nodeserver.go:206] ID: 10 Req-ID: data-platform-cephfs-read-smaug cephfs: successfully mounted volume data-platform-cephfs-read-smaug to /var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.csi.ceph.com/cdc6743d9f28807918659cd1a98dcd8722c76e1e504072c73d109c5ecec66fc7/globalmount
I1004 11:09:12.961406  392133 utils.go:212] ID: 10 Req-ID: data-platform-cephfs-read-smaug GRPC response: {}
I1004 11:09:12.962263  392133 utils.go:195] ID: 11 GRPC call: /csi.v1.Node/NodeGetCapabilities
I1004 11:09:12.962319  392133 utils.go:206] ID: 11 GRPC request: {}
I1004 11:09:12.962419  392133 utils.go:212] ID: 11 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}}]}
I1004 11:09:12.963108  392133 utils.go:195] ID: 12 GRPC call: /csi.v1.Node/NodeGetCapabilities
I1004 11:09:12.963152  392133 utils.go:206] ID: 12 GRPC request: {}
I1004 11:09:12.963249  392133 utils.go:212] ID: 12 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}}]}
I1004 11:09:12.963935  392133 utils.go:195] ID: 13 GRPC call: /csi.v1.Node/NodeGetCapabilities
I1004 11:09:12.963982  392133 utils.go:206] ID: 13 GRPC request: {}
I1004 11:09:12.964082  392133 utils.go:212] ID: 13 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}}]}
I1004 11:09:12.964945  392133 utils.go:195] ID: 14 Req-ID: data-platform-cephfs-read-smaug GRPC call: /csi.v1.Node/NodePublishVolume
I1004 11:09:12.965179  392133 utils.go:206] ID: 14 Req-ID: data-platform-cephfs-read-smaug GRPC request: {"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.csi.ceph.com/cdc6743d9f28807918659cd1a98dcd8722c76e1e504072c73d109c5ecec66fc7/globalmount","target_path":"/var/lib/kubelet/pods/65c2f352-0521-463e-aebd-bb1e7f14ce9c/volumes/kubernetes.io~csi/data-platform-cephfs-read-smaug/mount","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":3}},"volume_context":{"clusterID":"96df99f6-fc1a-11ea-90a4-6cb3113cb732","fsName":"storage","kernelMountOptions":"recover_session=clean,ro","rootPath":"/smaug","staticVolume":"true"},"volume_id":"data-platform-cephfs-read-smaug"}
I1004 11:09:12.968443  392133 cephcmds.go:105] ID: 14 Req-ID: data-platform-cephfs-read-smaug command succeeded: mount [-o bind,_netdev /var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.csi.ceph.com/cdc6743d9f28807918659cd1a98dcd8722c76e1e504072c73d109c5ecec66fc7/globalmount /var/lib/kubelet/pods/65c2f352-0521-463e-aebd-bb1e7f14ce9c/volumes/kubernetes.io~csi/data-platform-cephfs-read-smaug/mount]
I1004 11:09:12.968469  392133 nodeserver.go:467] ID: 14 Req-ID: data-platform-cephfs-read-smaug cephfs: successfully bind-mounted volume data-platform-cephfs-read-smaug to /var/lib/kubelet/pods/65c2f352-0521-463e-aebd-bb1e7f14ce9c/volumes/kubernetes.io~csi/data-platform-cephfs-read-smaug/mount
I1004 11:09:12.968524  392133 utils.go:212] ID: 14 Req-ID: data-platform-cephfs-read-smaug GRPC response: {}

thrashwerk avatar Oct 04 '22 11:10 thrashwerk

I have tested with cephcsi v3.7.0 quay.io/cephcsi/cephcsi:v3.7.0 everything works fine for me

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
  creationTimestamp: "2022-10-10T09:44:56Z"
  finalizers:
  - kubernetes.io/pvc-protection
  name: cephfs-static-pvc
  namespace: default
  resourceVersion: "5362"
  uid: 79e0ac7a-de40-4e60-b52d-d9e3e79930e5
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
  volumeMode: Filesystem
  volumeName: cephfs-static-pv
status:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 1Gi
  phase: Bound
apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/bound-by-controller: "yes"
  creationTimestamp: "2022-10-10T09:44:51Z"
  finalizers:
  - kubernetes.io/pv-protection
  name: cephfs-static-pv
  resourceVersion: "5358"
  uid: d1c849bd-7206-435f-9f5c-9247eb52b56d
spec:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 1Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: cephfs-static-pvc
    namespace: default
    resourceVersion: "5356"
    uid: 79e0ac7a-de40-4e60-b52d-d9e3e79930e5
  csi:
    driver: rook-ceph.cephfs.csi.ceph.com
    nodeStageSecretRef:
      name: rook-csi-cephfs-node
      namespace: rook-ceph
    volumeAttributes:
      clusterID: rook-ceph
      fsName: myfs
      kernelMountOptions: recover_session=clean
      rootPath: /test
      staticVolume: "true"
    volumeHandle: cephfs-static-pv
  persistentVolumeReclaimPolicy: Retain
  volumeMode: Filesystem
status:
  phase: Bound
[🎩︎]mrajanna@fedora ceph-csi $]cat static-pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: pvc-test
spec:
  containers:
  - command:
    - tail
    - -f
    - /dev/null
    image: opensuse/leap
    name: opensuse
    resources:
      limits:
        cpu: "1"
        memory: 1Gi
      requests:
        cpu: "1"
        memory: 1Gi
    volumeMounts:
      - name: ceph-static-pvc
        mountPath: /mnt
  volumes:
    - name: ceph-static-pvc
      persistentVolumeClaim:
        claimName: cephfs-static-pvc
sh-4.4# df -h
Filesystem                Size  Used Avail Use% Mounted on
overlay                    28G  5.6G   21G  22% /
tmpfs                      64M     0   64M   0% /dev
tmpfs                     2.0G     0  2.0G   0% /sys/fs/cgroup
10.102.81.126:6789:/test   31G     0   31G   0% /mnt
/dev/vda1                  28G  5.6G   21G  22% /etc/hosts
shm                        64M     0   64M   0% /dev/shm
tmpfs                     1.0G   12K  1.0G   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs                     2.0G     0  2.0G   0% /proc/acpi
tmpfs                     2.0G     0  2.0G   0% /proc/scsi
tmpfs                     2.0G     0  2.0G   0% /sys/firmware
sh-4.4# touch /mnt/a
sh-4.4# echo testing > /mnt/a
sh-4.4# cat /mnt/a
testing
sh-4.4# 

Madhu-1 avatar Oct 10 '22 09:10 Madhu-1

I've tried completely uninstalling the old Helm release, installed 3.7.1, created a PV and PVC for a different volume in a different namespace, and created a pod using that PVC. Still didn't work.

Nodeplugin logs show that the global and pod mounts were successful and checked that they're working from the host but the pod still gets timed out waiting for the condition.

I'm not sure where to check anymore.

thrashwerk avatar Oct 10 '22 11:10 thrashwerk

I've tried completely uninstalling the old Helm release, installed 3.7.1, created a PV and PVC for a different volume in a different namespace, and created a pod using that PVC. Still didn't work.

Nodeplugin logs show that the global and pod mounts were successful and checked that they're working from the host but the pod still gets timed out waiting for the condition.

I'm not sure where to check anymore.

kubelet Rootpath is /var/lib/kubelet right? checking kubelet logs might help you. atleast what i can see from your nodeplugin logs is that from cephcsi the volume is attached to the requested targetPath (that is the work of cephcsi)

Madhu-1 avatar Oct 10 '22 11:10 Madhu-1

I experience the same issue, what debug information can I provide to help resolve the issue?

cephcsi: v3.7.2 Kubernetes: v1.24 Ceph: v17.2.4

PV is provisioned, PVC is bound.

$ kubectl get pvc
NAME             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS               AGE
cephfs-pvc       Bound    pvc-b79ec35a-be89-452d-a85e-0e470f231fff   1Gi        RWX            deploi-file-storage-sc     64s

Pod does not start

$ kubectl get pods
NAME                                           READY   STATUS                   RESTARTS      AGE
cephfs-demo-pod-2                              0/1     ContainerCreating        0             2m4s

These events are in the log:

$ kubectl describe pod cephfs-demo-pod-2
....
Events:
  Type     Reason              Age    From                     Message
  ----     ------              ----   ----                     -------
  Normal   Scheduled           2m25s  default-scheduler        Successfully assigned default/cephfs-demo-pod-2 to deploikubeworker1
  Warning  FailedAttachVolume  25s    attachdetach-controller  AttachVolume.Attach failed for volume "pvc-b79ec35a-be89-452d-a85e-0e470f231fff" : timed out waiting for external-attacher of cephfs.csi.ceph.com CSI driver to attach volume 0001-0024-1ea33904-4bad-11ed-b842-2d991d088396-0000000000000003-765fa4aa-55ec-11ed-b876-4698c7018e48
  Warning  FailedMount         22s    kubelet                  Unable to attach or mount volumes: unmounted volumes=[mycephfs2], unattached volumes=[mycephfs2 kube-api-access-kgmcn]: timed out waiting for the condition

martinfjohansen avatar Oct 27 '22 11:10 martinfjohansen

@martinfjohansen is your ceph cluster healthy? If yes please open an issue with the required logs. Am closing this one as we dont get the required logs as requested.

Madhu-1 avatar Oct 27 '22 12:10 Madhu-1

Sure, I will open a new ticket. What are the required logs?

martinfjohansen avatar Oct 27 '22 12:10 martinfjohansen

Sure, I will open a new ticket. What are the required logs?

Just fill in the issue template. Issue template contains all the required information.

Madhu-1 avatar Oct 27 '22 12:10 Madhu-1

Completed? What is completed? Ignorance of the bug?

Sorry that I can't keep breaking cluster every day to get the logs.

thrashwerk avatar Oct 27 '22 12:10 thrashwerk

kubelet Rootpath is /var/lib/kubelet right? checking kubelet logs might help you. atleast what i can see from your nodeplugin logs is that from cephcsi the volume is attached to the requested targetPath (that is the work of cephcsi)

@thrashwerk please check the above comment. Cephcsi's work is to mount the volume to the required targetPath asked by the kubelet, you should check kubelet logs to see what is happening which is outside of cephcsi.

Madhu-1 avatar Oct 27 '22 13:10 Madhu-1

I have now posted logs over in a new issue #3476

martinfjohansen avatar Oct 27 '22 13:10 martinfjohansen