trident
trident copied to clipboard
Trident is trying to unmount volumes that not exist anymore
Hello, we are observing a lot (file reached 1.9G in few days) of following entries in /var/log/messages on one of the nodes:
Oct 7 14:11:31 saminio0913 kubelet: E1007 14:11:31.596665 37204 reconciler.go:193] operationExecutor.UnmountVolume failed (controllerAttachDetachEnabled true) for volume "data" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0") pod "2c80a1fe-25ba-4fab-a031-9b4594eded9e" (UID: "2c80a1fe-25ba-4fab-a031-9b4594eded9e") : UnmountVolume.NewUnmounter failed for volume "data" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0") pod "2c80a1fe-25ba-4fab-a031-9b4594eded9e" (UID: "2c80a1fe-25ba-4fab-a031-9b4594eded9e") : kubernetes.io/csi: unmounter failed to load volume data file [/var/lib/kubelet/pods/2c80a1fe-25ba-4fab-a031-9b4594eded9e/volumes/kubernetes.io~csi/pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0/mount]: kubernetes.io/csi: failed to open volume data file [/var/lib/kubelet/pods/2c80a1fe-25ba-4fab-a031-9b4594eded9e/volumes/kubernetes.io~csi/pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0/vol_data.json]: open /var/lib/kubelet/pods/2c80a1fe-25ba-4fab-a031-9b4594eded9e/volumes/kubernetes.io~csi/pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0/vol_data.json: no such file or directory
PV is not present in Kubernetes, not visible on Trident volume list, LUN's for volume is not exported. Only leftovers remained in /var/lib/kubelet/plugins/kubernetes.io/csi/pv/ directory on node. Pod 2c80a1fe-25ba-4fab-a031-9b4594eded9e also does not exist in any of the namespaces. It looks like Trident is trying to unmount the volume that in fact does not exist anymore.
Environment Provide accurate information about the environment to help us reproduce the issue.
- Trident version: 20.07.1
- Trident installation flags used: -d -n trident-system --use-custom-yaml
- Container runtime: 19.03.13
- Kubernetes version: v1.19.2
- OS: RHEL 7.8
- NetApp backend types: ontap-san
To Reproduce
We noticed it because /var/log/messages on one of the nodes grew to more than 2G in few days,
Expected behavior Any information about PV will be deleted from node/trident after PV's deletion.
Additional context
on /var/log/messages we have a lot of such entries:
Oct 7 14:24:05 saminio0913 kubelet: E1007 14:24:05.506481 37204 reconciler.go:193] operationExecutor.UnmountVolume failed (controllerAttachDetachEnabled true) for volume "data" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0") pod "2c80a1fe-25ba-4fab-a031-9b4594eded9e" (UID: "2c80a1fe-25ba-4fab-a031-9b4594eded9e") : UnmountVolume.NewUnmounter failed for volume "data" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0") pod "2c80a1fe-25ba-4fab-a031-9b4594eded9e" (UID: "2c80a1fe-25ba-4fab-a031-9b4594eded9e") : kubernetes.io/csi: unmounter failed to load volume data file [/var/lib/kubelet/pods/2c80a1fe-25ba-4fab-a031-9b4594eded9e/volumes/kubernetes.io~csi/pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0/mount]: kubernetes.io/csi: failed to open volume data file [/var/lib/kubelet/pods/2c80a1fe-25ba-4fab-a031-9b4594eded9e/volumes/kubernetes.io~csi/pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0/vol_data.json]: open /var/lib/kubelet/pods/2c80a1fe-25ba-4fab-a031-9b4594eded9e/volumes/kubernetes.io~csi/pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0/vol_data.json: no such file or directory
Oct 7 14:24:05 saminio0913 kubelet: E1007 14:24:05.607452 37204 reconciler.go:193] operationExecutor.UnmountVolume failed (controllerAttachDetachEnabled true) for volume "data" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0") pod "2c80a1fe-25ba-4fab-a031-9b4594eded9e" (UID: "2c80a1fe-25ba-4fab-a031-9b4594eded9e") : UnmountVolume.NewUnmounter failed for volume "data" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0") pod "2c80a1fe-25ba-4fab-a031-9b4594eded9e" (UID: "2c80a1fe-25ba-4fab-a031-9b4594eded9e") : kubernetes.io/csi: unmounter failed to load volume data file [/var/lib/kubelet/pods/2c80a1fe-25ba-4fab-a031-9b4594eded9e/volumes/kubernetes.io~csi/pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0/mount]: kubernetes.io/csi: failed to open volume data file [/var/lib/kubelet/pods/2c80a1fe-25ba-4fab-a031-9b4594eded9e/volumes/kubernetes.io~csi/pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0/vol_data.json]: open /var/lib/kubelet/pods/2c80a1fe-25ba-4fab-a031-9b4594eded9e/volumes/kubernetes.io~csi/pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0/vol_data.json: no such file or directory
Oct 7 14:24:05 saminio0913 kubelet: E1007 14:24:05.708631 37204 reconciler.go:193] operationExecutor.UnmountVolume failed (controllerAttachDetachEnabled true) for volume "data" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0") pod "2c80a1fe-25ba-4fab-a031-9b4594eded9e" (UID: "2c80a1fe-25ba-4fab-a031-9b4594eded9e") : UnmountVolume.NewUnmounter failed for volume "data" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0") pod "2c80a1fe-25ba-4fab-a031-9b4594eded9e" (UID: "2c80a1fe-25ba-4fab-a031-9b4594eded9e") : kubernetes.io/csi: unmounter failed to load volume data file [/var/lib/kubelet/pods/2c80a1fe-25ba-4fab-a031-9b4594eded9e/volumes/kubernetes.io~csi/pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0/mount]: kubernetes.io/csi: failed to open volume data file [/var/lib/kubelet/pods/2c80a1fe-25ba-4fab-a031-9b4594eded9e/volumes/kubernetes.io~csi/pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0/vol_data.json]: open /var/lib/kubelet/pods/2c80a1fe-25ba-4fab-a031-9b4594eded9e/volumes/kubernetes.io~csi/pvc-b50de847-ae4b-4b1e-9db9-ee615b8562a0/vol_data.json: no such file or directory
Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.772436 37204 clientconn.go:106] parsed scheme: ""
Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.772456 37204 clientconn.go:106] scheme "" not registered, fallback to default scheme
Oct 7 14:24:05 saminio0913 kubelet: I1007 14:24:05.772516 37204 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/kubelet/plugins/csi.trident.netapp.io/csi.sock
in csi-attacher logs we can found a lot of entries concerning missing volumes:
I1007 12:25:52.141170 1 csi_handler.go:267] Detaching "csi-dd51bacc1cfbd0435a92d1ea8e05f9fb027ce4bd8cd3b7015dbfcc8d040ea158" I1007 12:25:52.341337 1 csi_handler.go:219] Error processing "csi-454b489606a09a31102c79d7bef04210c29fe6982abb3453cea3ab2425c2973e": failed to detach: persistentvolume "pvc-45752f8a-7e43-421f-9633-1b435469ada3" not found I1007 12:25:52.341410 1 csi_handler.go:267] Detaching "csi-1fa39a0df001bbe261b8b2eef66e6a6a715073b7e96f4f1a795ade387a3a2920" I1007 12:25:52.541777 1 csi_handler.go:219] Error processing "csi-73a5247d6e46e817f4a029d13c1ca5c3c13b0f08aca95288bd483ad9b9b59c8b": failed to detach: persistentvolume "pvc-0e650e23-cf14-44dd-8e1a-ee8c26fd691f" not found I1007 12:25:52.542029 1 csi_handler.go:267] Detaching "csi-aeb9414b9d8415e4093eb2beff8d1bd696be7bdd0068a0366ca3764bbff6c296" I1007 12:25:52.740658 1 csi_handler.go:219] Error processing "csi-71934b29a3f70a324fc0106345abc551b8df98b32081e25317c0d961f1cf4369": failed to detach: persistentvolume "pvc-54b2bfcd-6032-4077-9067-280a0135f46b" not found I1007 12:25:52.740750 1 csi_handler.go:267] Detaching "csi-47e9322037e0d4e0fe69781cb72b0e9cd72179912e6124cc1d4319f8e1915154" I1007 12:25:52.940502 1 csi_handler.go:219] Error processing "csi-180c6640581376335d936d503c041d942de29af84f596c3834847f4beccc91ae": failed to detach: persistentvolume "pvc-a8156fa7-64c4-4b8d-9703-fb03d0704635" not found I1007 12:25:52.940697 1 csi_handler.go:267] Detaching "csi-22d827840d95cd7e2d5da71c2444565cee90357a064e86b4b6cc24249c5986e1" I1007 12:25:53.141603 1 csi_handler.go:219] Error processing "csi-ef3f07631ba98cd6392b3426bdcfbcc4594d3b776f4b94590f4b4d1f390be3de": failed to detach: persistentvolume "pvc-d464b52c-e012-41cd-bb33-69fc3e36b090" not found I1007 12:25:53.141709 1 csi_handler.go:267] Detaching "csi-0c97eec6ea939e16de48b2cc5d009a5fe29233bb47e54e78e33bf5c8b43f1781" I1007 12:25:53.342296 1 csi_handler.go:219] Error processing "csi-af28f2cf9468f24e8392eebb56eee9cb794362410702b9455452a393d8d579f4": failed to detach: persistentvolume "pvc-b1e5f214-4e16-4efd-88d1-c71e110d461f" not found I1007 12:25:53.342378 1 csi_handler.go:267] Detaching "csi-eb51b57a8a6d820fbfb6a68b3b6c163d5b9c6684ede7a0997d050ae668f66985" I1007 12:25:53.542867 1 csi_handler.go:219] Error processing "csi-5d640ddd6d5cf5da6f714375d869573c9ecaf634ae3ad6700ebad90ce2fac9b1": failed to detach: persistentvolume "pvc-cfcd2179-21e8-415f-9ddf-15925b62fd0d" not found I1007 12:25:53.542975 1 csi_handler.go:267] Detaching "csi-6e1ce04fc21d3ddd16a0bed3f4f560acbbc9a1ab7cbdd1093379974dbb37aa01" I1007 12:25:53.741522 1 csi_handler.go:219] Error processing "csi-ccf6208b45331730b093e35a7838aa93f13cec22f88f80a80086887803973771": failed to detach: persistentvolume "pvc-b2c0e687-da09-4da6-abce-6eaee074d1f8" not found I1007 12:25:53.741657 1 csi_handler.go:267] Detaching "csi-164431c374b4f464a3178095b39a156a730a82a9e398a7f10f279466f31ac9cd" I1007 12:25:53.941470 1 csi_handler.go:219] Error processing "csi-538be594c97e87ab43a87e7ea6ce3e9039c1b077f688273ed113dca1c807cc3f": failed to detach: persistentvolume "pvc-fec2d453-d921-476e-97f0-78912a65b393" not found I1007 12:25:53.941648 1 csi_handler.go:267] Detaching "csi-65166aee0d924f63a7ca7017f0d7c4250bd1c5811880b38983f698b65e4b4027"
none of the volumes is present in kubernetes:
$ for i in kubectl logs -n trident-system trident-csi-79647c4d4-j9wkx csi-attacher --since=5m | grep Error | cut -d"\"" -f4
; do kubectl describe pv $i
; done
Error from server (NotFound): persistentvolumes "pvc-0cbc0ce0-3197-4820-a87e-80179d565def" not found
Error from server (NotFound): persistentvolumes "pvc-b44d0960-92be-4d81-aed1-fa84d4ed93ee" not found
Error from server (NotFound): persistentvolumes "pvc-90bd092a-48d0-44a8-bd22-2a3d70e40cf3" not found
Error from server (NotFound): persistentvolumes "pvc-1340401c-e391-45e7-bb55-70c188e0cf97" not found
Error from server (NotFound): persistentvolumes "pvc-3ee2d55b-edaa-4029-ad43-c843cbb8911b" not found
Error from server (NotFound): persistentvolumes "pvc-533037d5-5b8c-43bf-992d-6b31065f9a98" not found
Error from server (NotFound): persistentvolumes "pvc-988240d8-8fbb-43a1-9405-8abb4c0f3d3e" not found
Error from server (NotFound): persistentvolumes "pvc-53658229-06f5-477a-aa57-5f5730fe7e12" not found
Same problem here. This makes the /var partition too big, and creates DiskPressure pods eviction (Evicted / The node had condition: [DiskPressure] ). The logs mostly come with HPA auto-scaling behavior, when many pods are dynamically created and terminated.
I have upgraded to netapp-trident 20.10.0 (latest tag), hoping for a fix : unfortunately it is not fixed yet.
/var/log/messages continuously grow up to ~1.2 GB per hour with these messages :
Nov 19 09:25:14 frd3kq-k8s03 kubelet[33077]: E1119 09:25:14.131509 33077 reconciler.go:193] operationExecutor.UnmountVolume failed (controllerAttachDetachEnabled true) for volume "fairseq" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb") pod "84d90239-114f-4ff7-b404-f3528a1cf8f3" (UID: "84d90239-114f-4ff7-b404-f3528a1cf8f3") : UnmountVolume.NewUnmounter failed for volume "fairseq" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb") pod "84d90239-114f-4ff7-b404-f3528a1cf8f3" (UID: "84d90239-114f-4ff7-b404-f3528a1cf8f3") : kubernetes.io/csi: unmounter failed to load volume data file [/var/lib/kubelet/pods/84d90239-114f-4ff7-b404-f3528a1cf8f3/volumes/kubernetes.io~csi/pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb/mount]: kubernetes.io/csi: failed to open volume data file [/var/lib/kubelet/pods/84d90239-114f-4ff7-b404-f3528a1cf8f3/volumes/kubernetes.io~csi/pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb/vol_data.json]: open /var/lib/kubelet/pods/84d90239-114f-4ff7-b404-f3528a1cf8f3/volumes/kubernetes.io~csi/pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb/vol_data.json: no such file or directory
Nov 19 09:25:14 frd3kq-k8s03 kubelet[33077]: E1119 09:25:14.131605 33077 reconciler.go:193] operationExecutor.UnmountVolume failed (controllerAttachDetachEnabled true) for volume "fairseq" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb") pod "fb7a195f-5245-4356-b179-fc45586aa73a" (UID: "fb7a195f-5245-4356-b179-fc45586aa73a") : UnmountVolume.NewUnmounter failed for volume "fairseq" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb") pod "fb7a195f-5245-4356-b179-fc45586aa73a" (UID: "fb7a195f-5245-4356-b179-fc45586aa73a") : kubernetes.io/csi: unmounter failed to load volume data file [/var/lib/kubelet/pods/fb7a195f-5245-4356-b179-fc45586aa73a/volumes/kubernetes.io~csi/pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb/mount]: kubernetes.io/csi: failed to open volume data file [/var/lib/kubelet/pods/fb7a195f-5245-4356-b179-fc45586aa73a/volumes/kubernetes.io~csi/pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb/vol_data.json]: open /var/lib/kubelet/pods/fb7a195f-5245-4356-b179-fc45586aa73a/volumes/kubernetes.io~csi/pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb/vol_data.json: no such file or directory
Nov 19 09:25:14 frd3kq-k8s03 kubelet[33077]: E1119 09:25:14.131678 33077 reconciler.go:193] operationExecutor.UnmountVolume failed (controllerAttachDetachEnabled true) for volume "fairseq" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb") pod "526cc7aa-fba3-4969-91e3-d03c1295620f" (UID: "526cc7aa-fba3-4969-91e3-d03c1295620f") : UnmountVolume.NewUnmounter failed for volume "fairseq" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb") pod "526cc7aa-fba3-4969-91e3-d03c1295620f" (UID: "526cc7aa-fba3-4969-91e3-d03c1295620f") : kubernetes.io/csi: unmounter failed to load volume data file [/var/lib/kubelet/pods/526cc7aa-fba3-4969-91e3-d03c1295620f/volumes/kubernetes.io~csi/pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb/mount]: kubernetes.io/csi: failed to open volume data file [/var/lib/kubelet/pods/526cc7aa-fba3-4969-91e3-d03c1295620f/volumes/kubernetes.io~csi/pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb/vol_data.json]: open /var/lib/kubelet/pods/526cc7aa-fba3-4969-91e3-d03c1295620f/volumes/kubernetes.io~csi/pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb/vol_data.json: no such file or directory
Nov 19 09:25:14 frd3kq-k8s03 kubelet[33077]: E1119 09:25:14.131749 33077 reconciler.go:193] operationExecutor.UnmountVolume failed (controllerAttachDetachEnabled true) for volume "fairseq" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb") pod "e3783b2b-2942-48de-825e-bc36ebe848db" (UID: "e3783b2b-2942-48de-825e-bc36ebe848db") : UnmountVolume.NewUnmounter failed for volume "fairseq" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb") pod "e3783b2b-2942-48de-825e-bc36ebe848db" (UID: "e3783b2b-2942-48de-825e-bc36ebe848db") : kubernetes.io/csi: unmounter failed to load volume data file [/var/lib/kubelet/pods/e3783b2b-2942-48de-825e-bc36ebe848db/volumes/kubernetes.io~csi/pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb/mount]: kubernetes.io/csi: failed to open volume data file [/var/lib/kubelet/pods/e3783b2b-2942-48de-825e-bc36ebe848db/volumes/kubernetes.io~csi/pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb/vol_data.json]: open /var/lib/kubelet/pods/e3783b2b-2942-48de-825e-bc36ebe848db/volumes/kubernetes.io~csi/pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb/vol_data.json: no such file or directory
Nov 19 09:25:14 frd3kq-k8s03 kubelet[33077]: E1119 09:25:14.131857 33077 reconciler.go:193] operationExecutor.UnmountVolume failed (controllerAttachDetachEnabled true) for volume "fairseq" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb") pod "83240327-da3b-4d7e-b707-0e17d44ddb1d" (UID: "83240327-da3b-4d7e-b707-0e17d44ddb1d") : UnmountVolume.NewUnmounter failed for volume "fairseq" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb") pod "83240327-da3b-4d7e-b707-0e17d44ddb1d" (UID: "83240327-da3b-4d7e-b707-0e17d44ddb1d") : kubernetes.io/csi: unmounter failed to load volume data file [/var/lib/kubelet/pods/83240327-da3b-4d7e-b707-0e17d44ddb1d/volumes/kubernetes.io~csi/pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb/mount]: kubernetes.io/csi: failed to open volume data file [/var/lib/kubelet/pods/83240327-da3b-4d7e-b707-0e17d44ddb1d/volumes/kubernetes.io~csi/pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb/vol_data.json]: open /var/lib/kubelet/pods/83240327-da3b-4d7e-b707-0e17d44ddb1d/volumes/kubernetes.io~csi/pvc-317ee04d-d298-4ddc-82d4-6f1b8b4519fb/vol_data.json: no such file or directory
Current configuration :
- Trident version: 20.10.0
- Trident installation flags used: -n trident
- Container runtime: 19.03.11
- Kubernetes version: v1.19.3
- OS: CentOS 8
- NetApp backend types: ontap-nas
Hi @appropriate-jan and @ledroide,
Do you have additional steps to reproduce this issue?
If you are still experiencing this issue please contact NetApp support and share your logs with us.
To open a case with NetApp, please go to https://www.netapp.com/company/contact-us/support/ Find the appropriate number from your region to call in, or login. Note: Trident is a supported product by NetApp based on a supported Netapp storage SN. Open the case on the NetApp storage SN, and provide the description of the problem. Be sure to mention the product is Trident on Kubernetes, and provide the details. Mention this GitHub. The case will be directed to Trident support engineers for response.
Same here, yet unknown how to reproduce.
Kubernetes version (Rancher): 1.19.6 CentOS 7 Docker 19.3.12 Trident: 20.10.1
same here
Kubernetes 1.19.10 Ubuntu 18.04 Docker 19.3.8 Trident 21.01.2
Hi @appropriate-jan, @lopf, and @AndreasDeCrinis,
This issue has been fixed with the Trident v23.01.0 release. Changes were made to improve Trident's iSCSI volume attachment code to deal with multiple scenarios that could cause the reported issue. This includes deleting the storage volume outside of Trident while the volume is mounted, a dataLIF that is offline, or a network path that is down.
Thanks for letting us know about this issue as it led to multiple improvements made in the v23.01 release to improve iSCSI stability.