trident icon indicating copy to clipboard operation
trident copied to clipboard

ontap-iscsi: Cannot recover from network issues

Open msau42 opened this issue 3 years ago • 0 comments

Describe the bug

  1. We had a network misconfiguration in our environment, so the iscsi session was not able to connect, which resulted in MountDevice failures.
  2. Bug 1. After fixing the network issues, iscsi was still not able to connect, because the CHAP was not registered.
time="2022-02-18T23:29:57Z" level=error msg="Error running iscsiadm login." requestID=ab061b1a-3ca9-4775-809c-2f258a4d94ac requestSource=CSI
time="2022-02-18T23:29:57Z" level=error msg="Failed to login with CHAP credentials: exit status 8 " requestID=ab061b1a-3ca9-4775-809c-2f258a4d94ac requestSource=CSI
  1. To register the CHAP, we restarted all trident controller and node pods.
  2. Bug 2. After the restart, the Trident node no longer shows CHAP errors, but now is giving errors that the volumePublishInfo.json is missing. I believe this is expected since the previous MountDevice calls never succeeded.
time="2022-02-18T23:42:02Z" level=error msg="GRPC error: rpc error: code = FailedPrecondition desc = unable to read the staging target /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d843f98a-ac7a-4144-b922-2fba478e1757/globalmount; open /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d843f98a-ac7a-4144-b922-2fba478e1757/globalmount/volumePublishInfo.json: no such file or directory" requestID=897c2e6b-03fa-4e48-aa43-8545ac9b78ce requestSource=CSI

Environment

  • Trident version: [e.g. 19.10] v21.01.0
  • Trident installation flags used: [e.g. -d -n trident --use-custom-yaml] custom
  • Container runtime: [e.g. Docker 19.03.1-CE]
  • Kubernetes version: [e.g. 1.15.1] 1.21.5
  • Kubernetes orchestrator: [e.g. OpenShift v3.11, Rancher v2.3.3] Anthos
  • Kubernetes enabled feature gates: [e.g. CSINodeInfo]
  • OS: [e.g. RHEL 7.6, Ubuntu 16.04] Ubuntu 20.04
  • NetApp backend types: [e.g. CVS for AWS, ONTAP AFF 9.5, HCI 1.7] ONTAP 9.9.1
  • Other:

To Reproduce

Expected behavior

  1. CHAP registration failure should be retried
  2. Trident should be able to recover from previous MountDevice failure

Additional context Add any other context about the problem here.

msau42 avatar Feb 19 '22 00:02 msau42