trident
trident copied to clipboard
ontap-iscsi: Cannot recover from network issues
Describe the bug
- We had a network misconfiguration in our environment, so the iscsi session was not able to connect, which resulted in MountDevice failures.
- Bug 1. After fixing the network issues, iscsi was still not able to connect, because the CHAP was not registered.
time="2022-02-18T23:29:57Z" level=error msg="Error running iscsiadm login." requestID=ab061b1a-3ca9-4775-809c-2f258a4d94ac requestSource=CSI
time="2022-02-18T23:29:57Z" level=error msg="Failed to login with CHAP credentials: exit status 8 " requestID=ab061b1a-3ca9-4775-809c-2f258a4d94ac requestSource=CSI
- To register the CHAP, we restarted all trident controller and node pods.
- Bug 2. After the restart, the Trident node no longer shows CHAP errors, but now is giving errors that the
volumePublishInfo.jsonis missing. I believe this is expected since the previous MountDevice calls never succeeded.
time="2022-02-18T23:42:02Z" level=error msg="GRPC error: rpc error: code = FailedPrecondition desc = unable to read the staging target /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d843f98a-ac7a-4144-b922-2fba478e1757/globalmount; open /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d843f98a-ac7a-4144-b922-2fba478e1757/globalmount/volumePublishInfo.json: no such file or directory" requestID=897c2e6b-03fa-4e48-aa43-8545ac9b78ce requestSource=CSI
Environment
- Trident version: [e.g. 19.10] v21.01.0
- Trident installation flags used: [e.g. -d -n trident --use-custom-yaml] custom
- Container runtime: [e.g. Docker 19.03.1-CE]
- Kubernetes version: [e.g. 1.15.1] 1.21.5
- Kubernetes orchestrator: [e.g. OpenShift v3.11, Rancher v2.3.3] Anthos
- Kubernetes enabled feature gates: [e.g. CSINodeInfo]
- OS: [e.g. RHEL 7.6, Ubuntu 16.04] Ubuntu 20.04
- NetApp backend types: [e.g. CVS for AWS, ONTAP AFF 9.5, HCI 1.7] ONTAP 9.9.1
- Other:
To Reproduce
Expected behavior
- CHAP registration failure should be retried
- Trident should be able to recover from previous MountDevice failure
Additional context Add any other context about the problem here.