fluid
fluid copied to clipboard
[BUG] Pod stuck in Terminating because failed to NodeUnpublishVolume
What is your environment(Kubernetes version, Fluid version, etc.)
Describe the bug
The Fluid CSI plugin keep reporting errors like this:
E0428 15:31:10.496312 9 utils.go:101] GRPC error: rpc error: code = Internal desc = NodeUnpublishVolume: remove symlink error lstat targetPath /var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount error lstat /var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount: transport endpoint is not connected
I0428 15:33:12.598131 9 utils.go:97] GRPC call: /csi.v1.Node/NodeUnpublishVolume
I0428 15:33:12.598150 9 utils.go:98] GRPC request: {"target_path":"/var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount","volume_id":"default-fluid-data"}
E0428 15:33:12.598252 9 utils.go:101] GRPC error: rpc error: code = Internal desc = NodeUnpublishVolume: remove symlink error lstat targetPath /var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount error lstat /var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount: transport endpoint is not connected
I0428 15:35:14.619838 9 utils.go:97] GRPC call: /csi.v1.Node/NodeUnpublishVolume
I0428 15:35:14.619857 9 utils.go:98] GRPC request: {"target_path":"/var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount","volume_id":"default-fluid-data"}
E0428 15:35:14.619939 9 utils.go:101] GRPC error: rpc error: code = Internal desc = NodeUnpublishVolume: remove symlink error lstat targetPath /var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount error lstat /var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount: transport endpoint is not connected
I0428 15:37:16.625411 9 utils.go:97] GRPC call: /csi.v1.Node/NodeUnpublishVolume
I0428 15:37:16.625428 9 utils.go:98] GRPC request: {"target_path":"/var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount","volume_id":"default-fluid-data"}
E0428 15:37:16.625502 9 utils.go:101] GRPC error: rpc error: code = Internal desc = NodeUnpublishVolume: remove symlink error lstat targetPath /var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount error lstat /var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount: transport endpoint is not connected
I0428 15:39:18.705715 9 utils.go:97] GRPC call: /csi.v1.Node/NodeUnpublishVolume
I0428 15:39:18.705732 9 utils.go:98] GRPC request: {"target_path":"/var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount","volume_id":"default-fluid-data"}
E0428 15:39:18.705812 9 utils.go:101] GRPC error: rpc error: code = Internal desc = NodeUnpublishVolume: remove symlink error lstat targetPath /var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount error lstat /var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount: transport endpoint is not connected
I0428 15:41:20.712543 9 utils.go:97] GRPC call: /csi.v1.Node/NodeUnpublishVolume
I0428 15:41:20.712559 9 utils.go:98] GRPC request: {"target_path":"/var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount","volume_id":"default-fluid-data"}
E0428 15:41:20.712639 9 utils.go:101] GRPC error: rpc error: code = Internal desc = NodeUnpublishVolume: remove symlink error lstat targetPath /var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount error lstat /var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount: transport endpoint is not connected
That means, /var/lib/kubelet/pods/.../mount is a broken mount point which blocks the utils.RemoveSymlink.
What you expect to happen: Volume should be unpublished successfully.
How to reproduce it
- create dataset & runtime
- create a pod with a mounted volume
- Kill the FUSE pod to make the mount point broken
- Before the mount point is recovered, try to nodeUnpublish the Pod(e.g. delete the pod)
Additional Information