k8s-csi-s3 icon indicating copy to clipboard operation
k8s-csi-s3 copied to clipboard

Processing geesefs crashes by csi provider

Open enp opened this issue 1 year ago • 5 comments
trafficstars

I see many cases with panic as result in geesefs code and sometimes (not always) can reproduce at least one of them - https://github.com/yandex-cloud/geesefs/issues/98

Maybe such cases should be processed in csi provider? On crash after helm install --namespace s3 --set secret.accessKey=<...> --set secret.secretKey=<..> csi-s3 yandex-s3/csi-s3 in yandex managed kubernetes + yandex object storage with default options I see geesefs-enp_2dstorage.service: Main process exited, code=exited, status=2/INVALIDARGUMENT and can't use created pv/pvc anymore

Or just explain me please how to configure auto-recover from this crash

enp avatar Dec 27 '23 06:12 enp

I already answered in that issue but I'll repeat it here too, there's no good way to restore dead FUSE mounts in CSI driver (I tried some options), they are left in broken "transport endpoint not connected" state by the kernel and Kubernetes can't repair them - it should at least unmount them first, but it fails to even check the mountpoint when it's broken.

vitalif avatar Dec 27 '23 09:12 vitalif

So, only something like fusermount -f or even node reboot (both with data loss) is only one option for restore?

enp avatar Dec 27 '23 10:12 enp

Just find the bad mountpoint and do a regular unmount. umount /var/lib/kubernetes/... If it fails with "device or resource busy" then it means that some app is still holding an open file descriptor for it - find and kill that app/pod and retry. Or you can use umount -l, then it will be detached immediately and cleaned up in the background.

vitalif avatar Dec 27 '23 11:12 vitalif

No need to do something with pv/pvc in this case? After umount I can just delete pod and it will be recreated by deployment with new name and mounted volume, right?

enp avatar Dec 27 '23 13:12 enp

Yes

vitalif avatar Dec 27 '23 19:12 vitalif