Pods are unable to mount PVCs (ISCSi block devices)
Recently we have faced another issue with k8s cluster and trident provided PVCs (iSCSI)
Trident version: 24.10 k8s version: v1.30.6 OS: flatcar 4081.2.0 Container runtime: docker://26.1.0 Kubernetes orchestrator: Rancher (custom cluster) NetApp backend types: ONTAP AFF (ONTAP 9.12.1P12)
Problem description:
After recent problems with a cluster when most of a pods lost an access to the Netapp block volumes (due to unknown reason yet) we are still facing some strange issues with volume attachment. Once pod is restarted (even on other node) it is hanged in Init: 0/1 state. In the describe command output we can see errors like:
- Multi-Attach error for volume "pvc-2f31cac3-3fd2-4730-98de-9f06b5513551" Volume is already exclusively attached to one node and can't be attached to another
We are observing three scenarios regarding above error ^ (Multi-Attach error):
A) Kind VolumeAttachment exists but the node reference is set to the different one than the pod is scheduled
B) Kind VolumeAttachment exists and node equals to the one the pod is scheduled to
C) Kind VolumeAttachment does not exist at all and it is not being created by k8s
As a effect pod does not start (does not transit to the running state) and cluster is unable to resolve the situation on its own. The only one way to at least attempt to fix it, is by manual delete of Kind VolumeAttachment related to the pvc that pod wants to mount - and now two things may happen:
A) Nothing - pod cannot start/ no further events available:
Events: Type Reason Age From Message
Normal Scheduled 116s default-scheduler Successfully assigned X/X-pool-0-3 to node-X
B) After (~5x) several attempts (VolumeAttachment removal) actually pod will finally be able to start.
Occassionaly error regarding device not found is present:
- MountVolume.MountDevice failed for volume "pvc-
" : rpc error: code = Internal desc = rpc error: code = Internal desc = failed to stage volume: multipath device not found when it is expected
Cluster worker nodes were rebooted - iSCSI and multipathd seems to be healthy. However whole cluster seems to be struggling with PVC's attachments.
Hi @ptrkmkslv We will need to investigate more to identify the root cause. Can you please open a support case, so that we can collect all the required info to troubleshoot further ?
Has the Netapp storage been upgraded? We have exactly the same situation when updating NetApp MetroCluster to version 9.12.1. The problem corresponds to https://kb.netapp.com/Cloud/Astra/Trident/Trident_iSCSI_paths_do_not_recover_after_ONTAP_Upgrade ^^ Unfortunately, a rescan or reboot of the k8s nodes doesn't help. Only touching the VolumeAttachments resources helped...
@ptrkmkslv Do you have any updates to this issue, or has it been resolved?