trident icon indicating copy to clipboard operation
trident copied to clipboard

Trident is trying to unmount volumes that not exist anymore

Open appropriate-jan opened this issue 4 years ago • 4 comments

Hello, we are observing a lot (file reached 1.9G in few days) of following entries in /var/log/messages on one of the nodes:

Oct 7 14:11:31 saminio0913 kubelet: E1007 14:11:31.596665 37204 reconciler.go:193] operationExecutor.UnmountVolume failed (controllerAttachDetachEnabled true) for volume "data" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0") pod "2c80a1fe-25ba-4fab-a031-9b4594eded9e" (UID: "2c80a1fe-25ba-4fab-a031-9b4594eded9e") : UnmountVolume.NewUnmounter failed for volume "data" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0") pod "2c80a1fe-25ba-4fab-a031-9b4594eded9e" (UID: "2c80a1fe-25ba-4fab-a031-9b4594eded9e") : kubernetes.io/csi: unmounter failed to load volume data file [/var/lib/kubelet/pods/2c80a1fe-25ba-4fab-a031-9b4594eded9e/volumes/kubernetes.io~csi/pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0/mount]: kubernetes.io/csi: failed to open volume data file [/var/lib/kubelet/pods/2c80a1fe-25ba-4fab-a031-9b4594eded9e/volumes/kubernetes.io~csi/pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0/vol_data.json]: open /var/lib/kubelet/pods/2c80a1fe-25ba-4fab-a031-9b4594eded9e/volumes/kubernetes.io~csi/pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0/vol_data.json: no such file or directory

PV is not present in Kubernetes, not visible on Trident volume list, LUN's for volume is not exported. Only leftovers remained in /var/lib/kubelet/plugins/kubernetes.io/csi/pv/ directory on node. Pod 2c80a1fe-25ba-4fab-a031-9b4594eded9e also does not exist in any of the namespaces. It looks like Trident is trying to unmount the volume that in fact does not exist anymore.

Environment Provide accurate information about the environment to help us reproduce the issue.

  • Trident version: 20.07.1
  • Trident installation flags used: -d -n trident-system --use-custom-yaml
  • Container runtime: 19.03.13
  • Kubernetes version: v1.19.2
  • OS: RHEL 7.8
  • NetApp backend types: ontap-san

To Reproduce

We noticed it because /var/log/messages on one of the nodes grew to more than 2G in few days,

Expected behavior Any information about PV will be deleted from node/trident after PV's deletion.

Additional context

on /var/log/messages we have a lot of such entries:

Oct 7 14:24:05 saminio0913 kubelet: E1007 14:24:05.506481 37204 reconciler.go:193] operationExecutor.UnmountVolume failed (controllerAttachDetachEnabled true) for volume "data" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0") pod "2c80a1fe-25ba-4fab-a031-9b4594eded9e" (UID: "2c80a1fe-25ba-4fab-a031-9b4594eded9e") : UnmountVolume.NewUnmounter failed for volume "data" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0") pod "2c80a1fe-25ba-4fab-a031-9b4594eded9e" (UID: "2c80a1fe-25ba-4fab-a031-9b4594eded9e") : kubernetes.io/csi: unmounter failed to load volume data file [/var/lib/kubelet/pods/2c80a1fe-25ba-4fab-a031-9b4594eded9e/volumes/kubernetes.io~csi/pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0/mount]: kubernetes.io/csi: failed to open volume data file [/var/lib/kubelet/pods/2c80a1fe-25ba-4fab-a031-9b4594eded9e/volumes/kubernetes.io~csi/pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0/vol_data.json]: open /var/lib/kubelet/pods/2c80a1fe-25ba-4fab-a031-9b4594eded9e/volumes/kubernetes.io~csi/pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0/vol_data.json: no such file or directory Oct 7 14:24:05 saminio0913 kubelet: E1007 14:24:05.607452 37204 reconciler.go:193] operationExecutor.UnmountVolume failed (controllerAttachDetachEnabled true) for volume "data" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0") pod "2c80a1fe-25ba-4fab-a031-9b4594eded9e" (UID: "2c80a1fe-25ba-4fab-a031-9b4594eded9e") : UnmountVolume.NewUnmounter failed for volume "data" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0") pod "2c80a1fe-25ba-4fab-a031-9b4594eded9e" (UID: "2c80a1fe-25ba-4fab-a031-9b4594eded9e") : kubernetes.io/csi: unmounter failed to load volume data file [/var/lib/kubelet/pods/2c80a1fe-25ba-4fab-a031-9b4594eded9e/volumes/kubernetes.io~csi/pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0/mount]: kubernetes.io/csi: failed to open volume data file [/var/lib/kubelet/pods/2c80a1fe-25ba-4fab-a031-9b4594eded9e/volumes/kubernetes.io~csi/pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0/vol_data.json]: open /var/lib/kubelet/pods/2c80a1fe-25ba-4fab-a031-9b4594eded9e/volumes/kubernetes.io~csi/pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0/vol_data.json: no such file or directory Oct 7 14:24:05 saminio0913 kubelet: E1007 14:24:05.708631 37204 reconciler.go:193] operationExecutor.UnmountVolume failed (controllerAttachDetachEnabled true) for volume "data" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0") pod "2c80a1fe-25ba-4fab-a031-9b4594eded9e" (UID: "2c80a1fe-25ba-4fab-a031-9b4594eded9e") : UnmountVolume.NewUnmounter failed for volume "data" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0") pod "2c80a1fe-25ba-4fab-a031-9b4594eded9e" (UID: "2c80a1fe-25ba-4fab-a031-9b4594eded9e") : kubernetes.io/csi: unmounter failed to load volume data file [/var/lib/kubelet/pods/2c80a1fe-25ba-4fab-a031-9b4594eded9e/volumes/kubernetes.io~csi/pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0/mount]: kubernetes.io/csi: failed to open volume data file [/var/lib/kubelet/pods/2c80a1fe-25ba-4fab-a031-9b4594eded9e/volumes/kubernetes.io~csi/pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0/vol_data.json]: open /var/lib/kubelet/pods/2c80a1fe-25ba-4fab-a031-9b4594eded9e/volumes/kubernetes.io~csi/pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0/vol_data.json: no such file or directory Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.772436 37204 clientconn.go:106] parsed scheme: "" Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.772456 37204 clientconn.go:106] scheme "" not registered, fallback to default scheme Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.772516 37204 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/kubelet/plugins/csi.trident.netapp.io/csi.sock 0 }] } Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.772526 37204 clientconn.go:948] ClientConn switching balancer to "pick_first" Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.772571 37204 clientconn.go:897] blockingPicker: the picked transport is not ready, loop back to repick Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.772637 37204 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc002a6e660, {CONNECTING } Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.773549 37204 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc002a6e660, {READY } Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.774036 37204 clientconn.go:106] parsed scheme: "" Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.774048 37204 clientconn.go:106] scheme "" not registered, fallback to default scheme Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.774054 37204 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing" Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.774061 37204 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/kubelet/plugins/csi.trident.netapp.io/csi.sock 0 }] } Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.774069 37204 clientconn.go:948] ClientConn switching balancer to "pick_first" Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.774095 37204 clientconn.go:897] blockingPicker: the picked transport is not ready, loop back to repick Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.774106 37204 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc007dd69e0, {CONNECTING } Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.774315 37204 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc007dd69e0, {READY } Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.774767 37204 clientconn.go:106] parsed scheme: "" Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.774777 37204 clientconn.go:106] scheme "" not registered, fallback to default scheme Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.774789 37204 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/kubelet/plugins/csi.trident.netapp.io/csi.sock 0 }] } Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.774797 37204 clientconn.go:948] ClientConn switching balancer to "pick_first" Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.774815 37204 clientconn.go:897] blockingPicker: the picked transport is not ready, loop back to repick Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.774855 37204 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing" Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.774862 37204 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc002a6e9a0, {CONNECTING } Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.774975 37204 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc002a6e9a0, {READY } Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.775249 37204 clientconn.go:106] parsed scheme: "" Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.775258 37204 clientconn.go:106] scheme "" not registered, fallback to default scheme Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.775260 37204 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing" Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.775271 37204 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/kubelet/plugins/csi.trident.netapp.io/csi.sock 0 }] } Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.775278 37204 clientconn.go:948] ClientConn switching balancer to "pick_first" Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.775302 37204 clientconn.go:897] blockingPicker: the picked transport is not ready, loop back to repick Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.775308 37204 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc007dd6c30, {CONNECTING } Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.775467 37204 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc007dd6c30, {READY } Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.775750 37204 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing"

in csi-attacher logs we can found a lot of entries concerning missing volumes:

I1007 12:25:52.141170 1 csi_handler.go:267] Detaching "csi-dd51bacc1cfbd0435a92d1ea8e05f9fb027ce4bd8cd3b7015dbfcc8d040ea158" I1007 12:25:52.341337 1 csi_handler.go:219] Error processing "csi-454b489606a09a31102c79d7bef04210c29fe6982abb3453cea3ab2425c2973e": failed to detach: persistentvolume "pvc-45752f8a-7e43-421f-9633-1b435469ada3" not found I1007 12:25:52.341410 1 csi_handler.go:267] Detaching "csi-1fa39a0df001bbe261b8b2eef66e6a6a715073b7e96f4f1a795ade387a3a2920" I1007 12:25:52.541777 1 csi_handler.go:219] Error processing "csi-73a5247d6e46e817f4a029d13c1ca5c3c13b0f08aca95288bd483ad9b9b59c8b": failed to detach: persistentvolume "pvc-0e650e23-cf14-44dd-8e1a-ee8c26fd691f" not found I1007 12:25:52.542029 1 csi_handler.go:267] Detaching "csi-aeb9414b9d8415e4093eb2beff8d1bd696be7bdd0068a0366ca3764bbff6c296" I1007 12:25:52.740658 1 csi_handler.go:219] Error processing "csi-71934b29a3f70a324fc0106345abc551b8df98b32081e25317c0d961f1cf4369": failed to detach: persistentvolume "pvc-54b2bfcd-6032-4077-9067-280a0135f46b" not found I1007 12:25:52.740750 1 csi_handler.go:267] Detaching "csi-47e9322037e0d4e0fe69781cb72b0e9cd72179912e6124cc1d4319f8e1915154" I1007 12:25:52.940502 1 csi_handler.go:219] Error processing "csi-180c6640581376335d936d503c041d942de29af84f596c3834847f4beccc91ae": failed to detach: persistentvolume "pvc-a8156fa7-64c4-4b8d-9703-fb03d0704635" not found I1007 12:25:52.940697 1 csi_handler.go:267] Detaching "csi-22d827840d95cd7e2d5da71c2444565cee90357a064e86b4b6cc24249c5986e1" I1007 12:25:53.141603 1 csi_handler.go:219] Error processing "csi-ef3f07631ba98cd6392b3426bdcfbcc4594d3b776f4b94590f4b4d1f390be3de": failed to detach: persistentvolume "pvc-d464b52c-e012-41cd-bb33-69fc3e36b090" not found I1007 12:25:53.141709 1 csi_handler.go:267] Detaching "csi-0c97eec6ea939e16de48b2cc5d009a5fe29233bb47e54e78e33bf5c8b43f1781" I1007 12:25:53.342296 1 csi_handler.go:219] Error processing "csi-af28f2cf9468f24e8392eebb56eee9cb794362410702b9455452a393d8d579f4": failed to detach: persistentvolume "pvc-b1e5f214-4e16-4efd-88d1-c71e110d461f" not found I1007 12:25:53.342378 1 csi_handler.go:267] Detaching "csi-eb51b57a8a6d820fbfb6a68b3b6c163d5b9c6684ede7a0997d050ae668f66985" I1007 12:25:53.542867 1 csi_handler.go:219] Error processing "csi-5d640ddd6d5cf5da6f714375d869573c9ecaf634ae3ad6700ebad90ce2fac9b1": failed to detach: persistentvolume "pvc-cfcd2179-21e8-415f-9ddf-15925b62fd0d" not found I1007 12:25:53.542975 1 csi_handler.go:267] Detaching "csi-6e1ce04fc21d3ddd16a0bed3f4f560acbbc9a1ab7cbdd1093379974dbb37aa01" I1007 12:25:53.741522 1 csi_handler.go:219] Error processing "csi-ccf6208b45331730b093e35a7838aa93f13cec22f88f80a80086887803973771": failed to detach: persistentvolume "pvc-b2c0e687-da09-4da6-abce-6eaee074d1f8" not found I1007 12:25:53.741657 1 csi_handler.go:267] Detaching "csi-164431c374b4f464a3178095b39a156a730a82a9e398a7f10f279466f31ac9cd" I1007 12:25:53.941470 1 csi_handler.go:219] Error processing "csi-538be594c97e87ab43a87e7ea6ce3e9039c1b077f688273ed113dca1c807cc3f": failed to detach: persistentvolume "pvc-fec2d453-d921-476e-97f0-78912a65b393" not found I1007 12:25:53.941648 1 csi_handler.go:267] Detaching "csi-65166aee0d924f63a7ca7017f0d7c4250bd1c5811880b38983f698b65e4b4027"

none of the volumes is present in kubernetes:

$ for i in kubectl logs -n trident-system trident-csi-79647c4d4-j9wkx csi-attacher --since=5m | grep Error | cut -d"\"" -f4; do kubectl describe pv $i ; done Error from server (NotFound): persistentvolumes "pvc-0cbc0ce0-3197-4820-a87e-80179d565def" not found Error from server (NotFound): persistentvolumes "pvc-b44d0960-92be-4d81-aed1-fa84d4ed93ee" not found Error from server (NotFound): persistentvolumes "pvc-90bd092a-48d0-44a8-bd22-2a3d70e40cf3" not found Error from server (NotFound): persistentvolumes "pvc-1340401c-e391-45e7-bb55-70c188e0cf97" not found Error from server (NotFound): persistentvolumes "pvc-3ee2d55b-edaa-4029-ad43-c843cbb8911b" not found Error from server (NotFound): persistentvolumes "pvc-533037d5-5b8c-43bf-992d-6b31065f9a98" not found Error from server (NotFound): persistentvolumes "pvc-988240d8-8fbb-43a1-9405-8abb4c0f3d3e" not found Error from server (NotFound): persistentvolumes "pvc-53658229-06f5-477a-aa57-5f5730fe7e12" not found

appropriate-jan avatar Oct 07 '20 12:10 appropriate-jan

Same problem here. This makes the /var partition too big, and creates DiskPressure pods eviction (Evicted / The node had condition: [DiskPressure] ). The logs mostly come with HPA auto-scaling behavior, when many pods are dynamically created and terminated.

I have upgraded to netapp-trident 20.10.0 (latest tag), hoping for a fix : unfortunately it is not fixed yet.

/var/log/messages continuously grow up to ~1.2 GB per hour with these messages :

Nov 19 09:25:14 frd3kq-k8s03 kubelet[33077]: E1119 09:25:14.131509   33077 reconciler.go:193] operationExecutor.UnmountVolume failed (controllerAttachDetachEnabled true) for volume "fairseq" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb") pod "84d90239-114f-4ff7-b404-f3528a1cf8f3" (UID: "84d90239-114f-4ff7-b404-f3528a1cf8f3") : UnmountVolume.NewUnmounter failed for volume "fairseq" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb") pod "84d90239-114f-4ff7-b404-f3528a1cf8f3" (UID: "84d90239-114f-4ff7-b404-f3528a1cf8f3") : kubernetes.io/csi: unmounter failed to load volume data file [/var/lib/kubelet/pods/84d90239-114f-4ff7-b404-f3528a1cf8f3/volumes/kubernetes.io~csi/pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb/mount]: kubernetes.io/csi: failed to open volume data file [/var/lib/kubelet/pods/84d90239-114f-4ff7-b404-f3528a1cf8f3/volumes/kubernetes.io~csi/pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb/vol_data.json]: open /var/lib/kubelet/pods/84d90239-114f-4ff7-b404-f3528a1cf8f3/volumes/kubernetes.io~csi/pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb/vol_data.json: no such file or directory
Nov 19 09:25:14 frd3kq-k8s03 kubelet[33077]: E1119 09:25:14.131605   33077 reconciler.go:193] operationExecutor.UnmountVolume failed (controllerAttachDetachEnabled true) for volume "fairseq" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb") pod "fb7a195f-5245-4356-b179-fc45586aa73a" (UID: "fb7a195f-5245-4356-b179-fc45586aa73a") : UnmountVolume.NewUnmounter failed for volume "fairseq" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb") pod "fb7a195f-5245-4356-b179-fc45586aa73a" (UID: "fb7a195f-5245-4356-b179-fc45586aa73a") : kubernetes.io/csi: unmounter failed to load volume data file [/var/lib/kubelet/pods/fb7a195f-5245-4356-b179-fc45586aa73a/volumes/kubernetes.io~csi/pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb/mount]: kubernetes.io/csi: failed to open volume data file [/var/lib/kubelet/pods/fb7a195f-5245-4356-b179-fc45586aa73a/volumes/kubernetes.io~csi/pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb/vol_data.json]: open /var/lib/kubelet/pods/fb7a195f-5245-4356-b179-fc45586aa73a/volumes/kubernetes.io~csi/pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb/vol_data.json: no such file or directory
Nov 19 09:25:14 frd3kq-k8s03 kubelet[33077]: E1119 09:25:14.131678   33077 reconciler.go:193] operationExecutor.UnmountVolume failed (controllerAttachDetachEnabled true) for volume "fairseq" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb") pod "526cc7aa-fba3-4969-91e3-d03c1295620f" (UID: "526cc7aa-fba3-4969-91e3-d03c1295620f") : UnmountVolume.NewUnmounter failed for volume "fairseq" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb") pod "526cc7aa-fba3-4969-91e3-d03c1295620f" (UID: "526cc7aa-fba3-4969-91e3-d03c1295620f") : kubernetes.io/csi: unmounter failed to load volume data file [/var/lib/kubelet/pods/526cc7aa-fba3-4969-91e3-d03c1295620f/volumes/kubernetes.io~csi/pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb/mount]: kubernetes.io/csi: failed to open volume data file [/var/lib/kubelet/pods/526cc7aa-fba3-4969-91e3-d03c1295620f/volumes/kubernetes.io~csi/pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb/vol_data.json]: open /var/lib/kubelet/pods/526cc7aa-fba3-4969-91e3-d03c1295620f/volumes/kubernetes.io~csi/pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb/vol_data.json: no such file or directory
Nov 19 09:25:14 frd3kq-k8s03 kubelet[33077]: E1119 09:25:14.131749   33077 reconciler.go:193] operationExecutor.UnmountVolume failed (controllerAttachDetachEnabled true) for volume "fairseq" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb") pod "e3783b2b-2942-48de-825e-bc36ebe848db" (UID: "e3783b2b-2942-48de-825e-bc36ebe848db") : UnmountVolume.NewUnmounter failed for volume "fairseq" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb") pod "e3783b2b-2942-48de-825e-bc36ebe848db" (UID: "e3783b2b-2942-48de-825e-bc36ebe848db") : kubernetes.io/csi: unmounter failed to load volume data file [/var/lib/kubelet/pods/e3783b2b-2942-48de-825e-bc36ebe848db/volumes/kubernetes.io~csi/pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb/mount]: kubernetes.io/csi: failed to open volume data file [/var/lib/kubelet/pods/e3783b2b-2942-48de-825e-bc36ebe848db/volumes/kubernetes.io~csi/pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb/vol_data.json]: open /var/lib/kubelet/pods/e3783b2b-2942-48de-825e-bc36ebe848db/volumes/kubernetes.io~csi/pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb/vol_data.json: no such file or directory
Nov 19 09:25:14 frd3kq-k8s03 kubelet[33077]: E1119 09:25:14.131857   33077 reconciler.go:193] operationExecutor.UnmountVolume failed (controllerAttachDetachEnabled true) for volume "fairseq" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb") pod "83240327-da3b-4d7e-b707-0e17d44ddb1d" (UID: "83240327-da3b-4d7e-b707-0e17d44ddb1d") : UnmountVolume.NewUnmounter failed for volume "fairseq" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb") pod "83240327-da3b-4d7e-b707-0e17d44ddb1d" (UID: "83240327-da3b-4d7e-b707-0e17d44ddb1d") : kubernetes.io/csi: unmounter failed to load volume data file [/var/lib/kubelet/pods/83240327-da3b-4d7e-b707-0e17d44ddb1d/volumes/kubernetes.io~csi/pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb/mount]: kubernetes.io/csi: failed to open volume data file [/var/lib/kubelet/pods/83240327-da3b-4d7e-b707-0e17d44ddb1d/volumes/kubernetes.io~csi/pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb/vol_data.json]: open /var/lib/kubelet/pods/83240327-da3b-4d7e-b707-0e17d44ddb1d/volumes/kubernetes.io~csi/pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb/vol_data.json: no such file or directory

Current configuration :

  • Trident version: 20.10.0
  • Trident installation flags used: -n trident
  • Container runtime: 19.03.11
  • Kubernetes version: v1.19.3
  • OS: CentOS 8
  • NetApp backend types: ontap-nas

ledroide avatar Nov 19 '20 08:11 ledroide

Hi @appropriate-jan and @ledroide,

Do you have additional steps to reproduce this issue?

If you are still experiencing this issue please contact NetApp support and share your logs with us.

To open a case with NetApp, please go to https://www.netapp.com/company/contact-us/support/ Find the appropriate number from your region to call in, or login. Note: Trident is a supported product by NetApp based on a supported Netapp storage SN. Open the case on the NetApp storage SN, and provide the description of the problem. Be sure to mention the product is Trident on Kubernetes, and provide the details. Mention this GitHub. The case will be directed to Trident support engineers for response.

gnarl avatar Jan 25 '21 15:01 gnarl

Same here, yet unknown how to reproduce.

Kubernetes version (Rancher): 1.19.6 CentOS 7 Docker 19.3.12 Trident: 20.10.1

lopf avatar Apr 21 '21 09:04 lopf

same here

Kubernetes 1.19.10 Ubuntu 18.04 Docker 19.3.8 Trident 21.01.2

AndreasDeCrinis avatar Jun 01 '21 12:06 AndreasDeCrinis

Hi @appropriate-jan, @lopf, and @AndreasDeCrinis,

This issue has been fixed with the Trident v23.01.0 release. Changes were made to improve Trident's iSCSI volume attachment code to deal with multiple scenarios that could cause the reported issue. This includes deleting the storage volume outside of Trident while the volume is mounted, a dataLIF that is offline, or a network path that is down.

Thanks for letting us know about this issue as it led to multiple improvements made in the v23.01 release to improve iSCSI stability.

gnarl avatar Feb 01 '23 22:02 gnarl