csi/node_server.go hides underlying cause of LUKS open failures/retries

Open sergeykuperman opened this issue 2 weeks ago • 0 comments

i have an issue in my cluster, (using trident 25.06 ontap-san-economy driver, AWS FSX Ontap filesystem) where volumes fail retry attach for a long time, before finally succeeding, and i cannot reach the root cause of those failures because node_server.go hides underlying cause of LUKS open failure: https://github.com/NetApp/trident/blob/master/frontend/csi/node_server.go#L1857-L1861 the only indication i see is multiple "could not set LUKS volume passphrase" events in my namespace (where attach is happening):

LAST SEEN   TYPE      REASON              OBJECT                                                 MESSAGE
32s         Warning   FailedScheduling    pod/workspaces-ws-hbisv-deployment-54486d45dc-rd9xp    0/10 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/10 nodes are available: 10 Preemption is not helpful for scheduling.
24s         Warning   FailedScheduling    pod/workspaces-ws-hbisv-deployment-54486d45dc-rd9xp    0/10 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/10 nodes are available: 10 Preemption is not helpful for scheduling.
20s         Normal    Scheduled           pod/workspaces-ws-hbisv-deployment-54486d45dc-rd9xp    Successfully assigned ws-ns-workspaces-ws-hbisv/workspaces-ws-hbisv-deployment-54486d45dc-rd9xp to ip-10-250-208-253.eu-central-1.compute.internal
32s         Normal    SuccessfulCreate    replicaset/workspaces-ws-hbisv-deployment-54486d45dc   Created pod: workspaces-ws-hbisv-deployment-54486d45dc-rd9xp
33s         Normal    ScalingReplicaSet   deployment/workspaces-ws-hbisv-deployment              Scaled up replica set workspaces-ws-hbisv-deployment-54486d45dc to 1
35s         Normal    NoPods              poddisruptionbudget/ws-hbisv-pdb                       No matching pods found
0s          Normal    SuccessfulAttachVolume   pod/workspaces-ws-hbisv-deployment-54486d45dc-rd9xp    AttachVolume.Attach succeeded for volume "pp-consume-1dec7b6b5810"
0s          Warning   FailedMount              pod/workspaces-ws-hbisv-deployment-54486d45dc-rd9xp    MountVolume.MountDevice failed for volume "pp-consume-1dec7b6b5810" : rpc error: code = Internal desc = could not set LUKS volume passphrase
0s          Warning   FailedMount              pod/workspaces-ws-hbisv-deployment-54486d45dc-rd9xp    MountVolume.MountDevice failed for volume "pp-consume-1dec7b6b5810" : rpc error: code = Internal desc = could not set LUKS volume passphrase
0s          Warning   FailedMount              pod/workspaces-ws-hbisv-deployment-54486d45dc-rd9xp    MountVolume.MountDevice failed for volume "pp-consume-1dec7b6b5810" : rpc error: code = Internal desc = could not set LUKS volume passphrase
0s          Warning   FailedMount              pod/workspaces-ws-hbisv-deployment-54486d45dc-rd9xp    MountVolume.MountDevice failed for volume "pp-consume-1dec7b6b5810" : rpc error: code = Internal desc = could not set LUKS volume passphrase
0s          Warning   FailedMount              pod/workspaces-ws-hbisv-deployment-54486d45dc-rd9xp    MountVolume.MountDevice failed for volume "pp-consume-1dec7b6b5810" : rpc error: code = Internal desc = could not set LUKS volume passphrase

other indicator is that tridentctl get volumes command takes more than a minute to return response, number of tridentvolumes in the cluster is around 3800.

eventually (after 5-10 mins) the attach succeeds, so the passphrase secret and passphrase (which are not changed and exist at the moment of attach) are correct

attaching csi node server logs:

trident-node.log

Please advise on how can i troubleshoot this issue

Dec 15 '25 15:12 sergeykuperman