trident icon indicating copy to clipboard operation
trident copied to clipboard

AWS FSx ONTAP: Controller unable to delete PV when PVC is deleted

Open sergeykuperman opened this issue 6 months ago • 2 comments

Describe the bug i have configured trident to work against AWS FSx, with the following tridentbackendconfig:

kind: TridentBackendConfig
metadata:
  finalizers:
  - trident.netapp.io
  name: trident-fs-066809a7fa76cc0b1
  namespace: trident
spec:
  aws:
    apiRegion: eu-central-1
    fsxFileSystemId: fs-066809a7fa76cc0b1
  backendName: fs-066809a7fa76cc0b1
  credentials:
    name: <redacted>
    type: awsarn
  dataLIF: iscsi.svm-02e7b9d3f4d03cdfe.fs-066809a7fa76cc0b1.fsx.eu-central-1.amazonaws.com
  managementLIF: svm-02e7b9d3f4d03cdfe.fs-066809a7fa76cc0b1.fsx.eu-central-1.amazonaws.com
  sanType: nvme
  storage:
  - defaults:
      encryption: "false"
      luksEncryption: "true"
      spaceReserve: none
      unixPermissions: "0755"
    labels:
      luks: "true"
      storagetype: devspace-nvme-encrypted
    poolName: default
  storageDriverName: ontap-san
  svm: svm-a76cc0b1
  version: 1
status:
  backendInfo:
    backendName: fs-066809a7fa76cc0b1
    backendUUID: ad85fa81-c6e1-42a5-b129-1f90f671f944
  deletionPolicy: delete
  lastOperationStatus: Success
  message: Backend 'fs-066809a7fa76cc0b1' updated
  phase: Bound

and storageclass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  labels:
    app.kubernetes.io/instance: trident
    app.kubernetes.io/name: trident
    argocd.argoproj.io/instance: trident
  name: netapp-nvme-encrypted
parameters:
  csi.storage.k8s.io/node-expand-secret-name: luks-${pvc.name}
  csi.storage.k8s.io/node-expand-secret-namespace: ${pvc.namespace}
  csi.storage.k8s.io/node-publish-secret-name: luks-${pvc.name}
  csi.storage.k8s.io/node-publish-secret-namespace: ${pvc.namespace}
  csi.storage.k8s.io/node-stage-secret-name: luks-${pvc.name}
  csi.storage.k8s.io/node-stage-secret-namespace: ${pvc.namespace}
  selector: luks=true
provisioner: csi.trident.netapp.io
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

i have created pvc secret with a passphrase for luks, pvc, and pod, and i see a volume is successfully provisioned and bound. but when i try to delete the pvc, pv fails to delete, and i see the following log in the controller:

time="2025-06-09T07:21:42Z" level=error msg="Unable to delete volume from backend." backendUUID=ad85fa81-c6e1-42a5-b129-1f90f671f944 error="error checking for existing FSx volume: volume ARN /vol/trident_pvc_0191b352_25cb_479f_b864_5fffc21d6f84/namespace0 is invalid" logLayer=core requestID=5a02db82-fccb-4901-8a7d-72b9c995b496 requestSource=CSI volume=pvc-0191b352-25cb-479f-b864-5fffc21d6f84 workflow="volume=delete"
time="2025-06-09T07:21:42Z" level=debug msg="Could not delete volume." error="error checking for existing FSx volume: volume ARN /vol/trident_pvc_0191b352_25cb_479f_b864_5fffc21d6f84/namespace0 is invalid" logLayer=csi_frontend requestID=5a02db82-fccb-4901-8a7d-72b9c995b496 requestSource=CSI volumeName=pvc-0191b352-25cb-479f-b864-5fffc21d6f84 workflow="volume=delete"
time="2025-06-09T07:21:42Z" level=debug msg="<<<< DeleteVolume" Method=DeleteVolume Type=CSI_Controller logLayer=csi_frontend requestID=5a02db82-fccb-4901-8a7d-72b9c995b496 requestSource=CSI workflow="volume=delete"
time="2025-06-09T07:21:42Z" level=error msg="GRPC error: rpc error: code = Unknown desc = error checking for existing FSx volume: volume ARN /vol/trident_pvc_0191b352_25cb_479f_b864_5fffc21d6f84/namespace0 is invalid" logLayer=csi_frontend requestID=5a02db82-fccb-4901-8a7d-72b9c995b496 requestSource=CSI

Environment Provide accurate information about the environment to help us reproduce the issue.

  • Trident version: 25.02.1
  • Trident installation flags used: -n trident
  • Container runtime: containerd2
  • Kubernetes version: 1.30.12
  • Kubernetes orchestrator: Gardener (SAP)
  • Kubernetes enabled feature gates: NodeSwap
  • OS: Garden Linux 1592.4 (https://github.com/gardenlinux/gardenlinux)
  • NetApp backend types: AWS FSx ONTAP
  • Other:

To Reproduce create fsx ontap filesystem, install trident, install backendconfig i have posted, create pvc with LUKS and storageclass i posted, create pod, delete pvc

Expected behavior pv and underlying volume are deleted successfully

Additional context If i manually delete the volume from aws FSx, PV deletion proceeds as expected

sergeykuperman avatar Jun 09 '25 07:06 sergeykuperman

we are experiencing same issue, any update?

GabiKalaora avatar Jul 30 '25 15:07 GabiKalaora

Please open a support case with logs included. We do not currently test with Garden Linux or NodeSwap.

torirevilla avatar Aug 29 '25 14:08 torirevilla