k8s-csi-s3
k8s-csi-s3 copied to clipboard
Processing geesefs crashes by csi provider
I see many cases with panic as result in geesefs code and sometimes (not always) can reproduce at least one of them - https://github.com/yandex-cloud/geesefs/issues/98
Maybe such cases should be processed in csi provider? On crash after helm install --namespace s3 --set secret.accessKey=<...> --set secret.secretKey=<..> csi-s3 yandex-s3/csi-s3 in yandex managed kubernetes + yandex object storage with default options I see geesefs-enp_2dstorage.service: Main process exited, code=exited, status=2/INVALIDARGUMENT and can't use created pv/pvc anymore
Or just explain me please how to configure auto-recover from this crash
I already answered in that issue but I'll repeat it here too, there's no good way to restore dead FUSE mounts in CSI driver (I tried some options), they are left in broken "transport endpoint not connected" state by the kernel and Kubernetes can't repair them - it should at least unmount them first, but it fails to even check the mountpoint when it's broken.
So, only something like fusermount -f or even node reboot (both with data loss) is only one option for restore?
Just find the bad mountpoint and do a regular unmount. umount /var/lib/kubernetes/...
If it fails with "device or resource busy" then it means that some app is still holding an open file descriptor for it - find and kill that app/pod and retry. Or you can use umount -l, then it will be detached immediately and cleaned up in the background.
No need to do something with pv/pvc in this case? After umount I can just delete pod and it will be recreated by deployment with new name and mounted volume, right?
Yes