kueue
kueue copied to clipboard
Support KubeRay RayService as a Kueue workload
What would you like to be added:
Today Kueue supports RayJob and RayCluster as a supported workload but does not support RayService. I've heard feedback from some KubeRay users asking for RayService support. Similar to RayCluster support, we should support RayService as a kueue-able workload but without autoscaling support.
Why is this needed:
RayService is the only KubeRay resource not supported by Kueue. We should support it for full feature parity with KubeRay.
Completion requirements:
This enhancement requires the following artifacts:
- [ ] Design doc
- [ ] API change
- [X] Docs update
The artifacts should be linked in subsequent comments.
+1
Advanced scheduling features like Topology Aware Scheduling (TAS) and All-or-Nothing with Ready Pods is essential in production-grade inference workloads.
@weizhaowz do you have cycles to implement this?
Thank you folks for driving that!
Initially I tried add RayService controller, webhook and multikueue-adapter in pr, but in testing, I found the the RayCluster created for the RayService cannot be updated as the RayCluster is managed by its own controller, so KubeRay cannot provision the RayCluster. Therefore, we decide to let Kueue manage RayService through RayCluster, and this pr contains details
Thanks @weizhaowz
/close
@andrewsykim: Closing this issue.
In response to this:
Thanks @weizhaowz
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.