[Feature] Zero downtime upgrade for long-running requests.
Search before asking
- [x] I had searched in the issues and found no similar feature requirement.
Description
As far as I can tell, when zero-downtime upgrades are enabled (default), the old cluster resources are deleted 60s after the new cluster resources are ready. This means that any longer running tasks in the old cluster may be inadvertently cancelled before completion. This is especially an issue as Async request support is officially in the works for Serve (and we already use it ourselves).
The PR referenced below already implements the option for different upgrade strategies, so the bones are there.
A few solutions:
- Expose the constant here, so you can simply set it to the max runtime of your requests (simplest).
- Modify the deletion criteria to verify that all replicas have no ongoing requests. More robust, but also more error-prone.
Use case
A deployment with long-running inference, or quick fine tuning (i.e 60s-5m). Currently, zero-downtime upgrades may destroy the cluster while resources are running.
Related issues
https://github.com/ray-project/kuberay/pull/2468
Are you willing to submit a PR?
- [x] Yes I am willing to submit a PR!
Hi @Stack-Attack , Are you currently working on this? If not, I would like to work on this. Thanks!
@machichima Just came back to check on this and start work but fantastic that it's already done. Thanks :)!