fluid icon indicating copy to clipboard operation
fluid copied to clipboard

[BUG] NodeUnpublishVolume fail

Open maimuderizi opened this issue 1 year ago • 1 comments

What is your environment(Kubernetes version, Fluid version, etc.) Kubernetes v1.22.5 Fluid master

Describe the bug csi-nodeplugin-fluid-xxx I test fuse recovery and some corrupted mount points are generated. image

When I delete application pod, I find that deletion consumes a long time. Some unexpected logs were found: c197b266907e60cfbc52a9c629f3cb30 nodeserver.go fails to cleanup corrupted mount point when invoke NodeUnpublishVolume. For corrupted mount points, local variable 'notMount' is true, break unmount loop and then fail to invoke CleanupMountPoint. Maybe we should set 'notMount' to false in this case so that all corrupted mount points would be unmount.

func (ns *nodeServer) NodeUnpublishVolume(ctx context.Context, req *csi.NodeUnpublishVolumeRequest) (*csi.NodeUnpublishVolumeResponse, error) {
    .....
    mounter := mount.New("")
    for {
        notMount, err := mounter.IsLikelyNotMountPoint(targetPath)
        .....
        if err != nil {
	    if !mount.IsCorruptedMnt(err) {
	        // stat targetPath with unexpected error
		glog.Errorf("NodeUnpublishVolume: stat targetPath %s with error: %v", targetPath, err)
		return nil, status.Errorf(codes.Internal, "NodeUnpublishVolume: stat targetPath %s: %v", targetPath, err)
	    } else {
		// targetPath is corrupted
		glog.V(3).Infof("NodeUnpublishVolume: detected corrupted mountpoint on path %s with error %v", targetPath, err)
	    }
	}
	if notMount {
	    glog.V(3).Infof("NodeUnpublishVolume: umount %s success", targetPath)
	    break
	}
        ...
        err = mounter.Unmount(targetPath)
        ...
    }
    ...
    err = mount.CleanupMountPoint(targetPath, mounter, false)
    ...
}

What you expect to happen: All corrupted mount points should be unpublished successfully

How to reproduce it

Additional Information

maimuderizi avatar Apr 30 '24 11:04 maimuderizi

Hi @maimuderizi, which Fluid version are u using in your cluster? From the log screenshot, I guess that is a Fluid v0.9.X version? There's some bugs in Fluid v0.9.X for FUSE Recovery feature, and they are fixed in Fluid v1.0.0.

TrafalgarZZZ avatar May 11 '24 07:05 TrafalgarZZZ