goofys icon indicating copy to clipboard operation
goofys copied to clipboard

In kubernetes when killing pod with goffys container, pod stuck in terminating. exit code 137.

Open dpointk opened this issue 1 year ago • 2 comments

We have an issue with pods running goofys as a sidecar in kubernetes. When we delete the pod, it takes time or stuck shutting down the goofys container. This is due to goofys connection to s3 endpoint not being terminated properly. We weren't able to reproduce it in hyperscalers (aws,gcp,etc) .

The normal behavior of goofys when sent a sigkill , is to exit with exit code 137. We're trying to figure out if there's a way to avoid having this exit code in goofys so that our pods will not be stuck on terminating and be deleted properly.

goofys version 0.23.1 running command: goofys -f --dir-mode 0777 --file-mode 0777 -o allow_other --debug_s3 --endpoint https://xxx.xxx.xxx bucket_name /data

The pod is running as privileged in kubernetes.

We can rule out OOM issues as this was checked. It also happens with empty buckets and with new pods that were just spun up.

Any assistance will be appreciated.

dpointk avatar Apr 17 '24 12:04 dpointk

+1, I cant figure it out yet. Are you also running in the Alpine-based container @dpointk?

AsoTora avatar May 13 '24 08:05 AsoTora

a process cannot handle SIGKILL, so if k8s is sending goofys SIGKILL there's nothing it can do

kahing avatar Jul 24 '24 04:07 kahing