Le Duc Manh

Results 12 comments of Le Duc Manh

Hi @kevin85421. We are considering using KubeRay and Ray Serve for our production model servers. We want to have async feature. We plan to utilize FastAPI backgrounds tasks for running...

I'm willing to try providing the PR for the fix as well. But I'm gonna need some helps to start with how and where to fix.

> Do you mean: the user sends a request → a Ray Serve replica triggers a heavy workload → it returns a response without waiting for the heavy workload to...

I setup a long running endpoint (sleep for 5 minutes) and can see that the request got hang up during cluster rotation. It seems that regular requests are not drained...

After taking a look at the code, it seems the current logic is delete the old cluster after 60 seconds wait ([ref](https://github.com/ray-project/kuberay/blob/master/ray-operator/controllers/ray/rayservice_controller.go#L565)) 1 possible fix I could think of is...

> Are the heavy workloads separate Ray jobs? No we will just launch a FastAPI background job which will call to other Ray Deployments after a response has been returned...

@kevin85421 May I ask for your opinion regarding the serve shutdown feature for KubeRay? If it's reasonable, I can help with creating the PR for the feature.

Implementing the feature is not going to be easy because the during old cluster rotation, the head node service got rotated already. There are 2 ways I can think of:...

@ok-scale Yes. But it only supports returning the desired number of replicas, not which replica to scale down/terminate