kuberay [Bug] Investigate slow Ray pod termination

Search before asking

[X] I searched the issues and found no similar issues.

KubeRay Component

ray-operator

What happened + What you expected to happen

I've noticed that Ray pods often take a long time to terminate (minutes) after deleting a RayCluster CR. We should investigate why that is the case.

Reproduction script

Tear down a Ray cluster by deleting a RayCluster CR and observe the Ray pod's state, e.g. with watch -n 1 kubectl get pod. It might take a few minutes for the pod to terminate. There's no reason for a Ray pod to take so long to process sigterm and exit cleanly.

Anything else

No response

Are you willing to submit a PR?

[ ] Yes I am willing to submit a PR!

Aug 24 '22 01:08 DmitriGekhtman

I don't observe this issue at the moment; it takes 30 seconds to terminate both the Ray head and worker Pods.

Jan 30 '24 17:01 kevin85421

I guess 30 seconds is better than the minutes claimed in the issue description. But why does it take a full 30 seconds to do the termination? That sounds like the default K8s grace period, which suggests it requires a SIGKILL to pid 1 to stop a Ray pod.

Jan 30 '24 17:01 DmitriGekhtman

Probably the process running ray start --block does not clean up its child processes when it receives a SIGTERM. There's at least no SIGTERM handling that I can see in the code. I vaguely recall complaining about a similar issue to @rickyyx

Jan 30 '24 18:01 DmitriGekhtman

But why does it take a full 30 seconds to do the termination? That sounds like the default K8s grace period,

It makes sense to me. The kubectl has the default grace period 30 seconds (doc), but I am not sure whether client-go has the same behavior or not.

Probably the process running ray start --block does not clean up its child processes when it receives a SIGTERM.

I think so.

Jan 30 '24 18:01 kevin85421

I believe ray is treating SIGTERM as expected exit codes for ray start --block, I guess it's always the SIGKILL that's killing the ray pod.

Jan 30 '24 19:01 rickyyx

I believe ray is treating SIGTERM as expected exit codes for ray start --block, I guess it's always the SIGKILL that's killing the ray pod.

Is there any handling of a SIGTERM to the "ray start --block" process itself? This is what we need for correct termination on K8s.

Jan 30 '24 19:01 DmitriGekhtman

i see, so kuberay sends a SIGTERM to the entrypoint process itself?

Jan 30 '24 21:01 rickyyx

i see, so kuberay sends a SIGTERM to the entrypoint process itself?

More or less.

Technically, Kubernetes (even more specifically, the Kubelet) sends the SIGTERM when the KubeRay operator (or any other agent) marks the pod for deletion. Then the Kubelet waits a configurable timeout, then it sends SIGKILL. I am personally a little hazy on how Kubernetes handles the non-entrypoint processes; it might depend on the choice of container runtime.

Jan 30 '24 21:01 DmitriGekhtman

I see, I did a local test - I think sending the SIGTERM to the entrypoint process (ray start --block) does exit the process. But if the SIGTERM is sent to other ray processes like raylet, that does not exit the entrypoint process.

Jan 30 '24 22:01 rickyyx

I think sending the SIGTERM to the entrypoint process (ray start --block) does exit the process

It definitely would exit the Python interpreter! But I bet Ray would continue to run after you do that.

Jan 30 '24 22:01 DmitriGekhtman

I think sending the SIGTERM to the entrypoint process (ray start --block) does exit the process

It definitely would exit the Python interpreter! But I bet Ray would continue to run after you do that.

It wasn't running on my case. Worth validating with a kuberay pod.

Jan 30 '24 22:01 rickyyx

kuberay kuberay copied to clipboard

[Bug] Investigate slow Ray pod termination

Search before asking

KubeRay Component

What happened + What you expected to happen

Reproduction script

Anything else

Are you willing to submit a PR?

kuberay
kuberay copied to clipboard