fluid icon indicating copy to clipboard operation
fluid copied to clipboard

[BUG] Pod stuck in Terminating because failed to NodeUnpublishVolume

Open TrafalgarZZZ opened this issue 1 year ago • 0 comments

What is your environment(Kubernetes version, Fluid version, etc.)

Describe the bug

The Fluid CSI plugin keep reporting errors like this:

E0428 15:31:10.496312       9 utils.go:101] GRPC error: rpc error: code = Internal desc = NodeUnpublishVolume: remove symlink error lstat targetPath /var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount error lstat /var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount: transport endpoint is not connected
I0428 15:33:12.598131       9 utils.go:97] GRPC call: /csi.v1.Node/NodeUnpublishVolume
I0428 15:33:12.598150       9 utils.go:98] GRPC request: {"target_path":"/var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount","volume_id":"default-fluid-data"}
E0428 15:33:12.598252       9 utils.go:101] GRPC error: rpc error: code = Internal desc = NodeUnpublishVolume: remove symlink error lstat targetPath /var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount error lstat /var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount: transport endpoint is not connected
I0428 15:35:14.619838       9 utils.go:97] GRPC call: /csi.v1.Node/NodeUnpublishVolume
I0428 15:35:14.619857       9 utils.go:98] GRPC request: {"target_path":"/var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount","volume_id":"default-fluid-data"}
E0428 15:35:14.619939       9 utils.go:101] GRPC error: rpc error: code = Internal desc = NodeUnpublishVolume: remove symlink error lstat targetPath /var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount error lstat /var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount: transport endpoint is not connected
I0428 15:37:16.625411       9 utils.go:97] GRPC call: /csi.v1.Node/NodeUnpublishVolume
I0428 15:37:16.625428       9 utils.go:98] GRPC request: {"target_path":"/var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount","volume_id":"default-fluid-data"}
E0428 15:37:16.625502       9 utils.go:101] GRPC error: rpc error: code = Internal desc = NodeUnpublishVolume: remove symlink error lstat targetPath /var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount error lstat /var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount: transport endpoint is not connected
I0428 15:39:18.705715       9 utils.go:97] GRPC call: /csi.v1.Node/NodeUnpublishVolume
I0428 15:39:18.705732       9 utils.go:98] GRPC request: {"target_path":"/var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount","volume_id":"default-fluid-data"}
E0428 15:39:18.705812       9 utils.go:101] GRPC error: rpc error: code = Internal desc = NodeUnpublishVolume: remove symlink error lstat targetPath /var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount error lstat /var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount: transport endpoint is not connected
I0428 15:41:20.712543       9 utils.go:97] GRPC call: /csi.v1.Node/NodeUnpublishVolume
I0428 15:41:20.712559       9 utils.go:98] GRPC request: {"target_path":"/var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount","volume_id":"default-fluid-data"}
E0428 15:41:20.712639       9 utils.go:101] GRPC error: rpc error: code = Internal desc = NodeUnpublishVolume: remove symlink error lstat targetPath /var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount error lstat /var/lib/kubelet/pods/d0c35690-3318-4c89-a13f-f146ae41a2dd/volumes/kubernetes.io~csi/default-fluid-data/mount: transport endpoint is not connected

That means, /var/lib/kubelet/pods/.../mount is a broken mount point which blocks the utils.RemoveSymlink.

What you expect to happen: Volume should be unpublished successfully.

How to reproduce it

  1. create dataset & runtime
  2. create a pod with a mounted volume
  3. Kill the FUSE pod to make the mount point broken
  4. Before the mount point is recovered, try to nodeUnpublish the Pod(e.g. delete the pod)

Additional Information

TrafalgarZZZ avatar Apr 28 '24 07:04 TrafalgarZZZ