k8s-csi-s3 icon indicating copy to clipboard operation
k8s-csi-s3 copied to clipboard

CSI-S3 fails after a few hours of inactivity

Open uzhinskiy opened this issue 2 years ago • 1 comments

Hello. We are trying to use CSI-S3 with geesefs as storage backend for elasticsearch. We are using this elasticsearch as a snapshot checker. Most of the time it is idle and not processing any data. We noticed that after a few hours of inactivity all IO operations in elasticsearch's pod failed with following log lines in kube-system/csi-s3-XXX:

E0329 12:21:59.786708      1 utils.go:101] GRPC error: rpc error: code = Internal desc = Unmount failed: exit status 32
Unmounting arguments: /var/lib/kubelet/pods/44d8a275-2b1d-4236-8d6a-ba6f4d709b60/volumes/kubernetes.io~csi/pvc-69e61d54-8b2a-420d-b1b3-0260b790d33e/mount
Output: umount: /var/lib/kubelet/pods/44d8a275-2b1d-4236-8d6a-ba6f4d709b60/volumes/kubernetes.io~csi/pvc-69e61d54-8b2a-420d-b1b3-0260b790d33e/mount: not mounted

After we manually restarted this pod everything was fine again. We suspect that the problem could be caused by network disruption which leads to TCP connection termination, which is not being reestablished after that network problem is gone.

How do we prevent this behavior of CSI-S3?

Thank you.

uzhinskiy avatar Mar 29 '22 13:03 uzhinskiy