kuberay
kuberay copied to clipboard
[Feature] can RayJob run in a local cluster ?
Search before asking
- [X] I had searched in the issues and found no similar feature requirement.
Description
run a job in local cluster, without build a remote RayCluster
Use case
in my case, I have many small jobs which can be run in single node with a few resource and will be finished in 60s; when use RayCluster, it will cost additional 60s(about) to build a RayCluster before job run it's code.
Related issues
No response
Are you willing to submit a PR?
- [ ] Yes I am willing to submit a PR!
run a job in local cluster, without build a remote RayCluster
A RayJob can run on an existing cluster using the clusterSelector field. This way you can create a single RayCluster and then run multiple RayJob against the RayCluster .Would that work for you?
run a job in local cluster, without build a remote RayCluster
you can create a single RayCluster and then run multiple RayJob against the RayCluster .
Thx, It's not work in my case. We want a quick run and return result, so there are some kinds of images with different pre-installed dependencies. Second, even with an exist cluster with HPA, kuberay still need to create a k8s job to submit job. If no enough resource, raycluster HPA will cost more seconds.
If we can run rayjob in the Submit Job's Pod?
kuberay still need to create a k8s job to submit job
There's a HTTP submission mode that doesn't use submitter Job https://github.com/ray-project/kuberay/blob/master/ray-operator/apis/ray/v1/rayjob_types.go#L92
You can use the runtime environment to install the dependencies: https://docs.ray.io/en/latest/ray-core/handling-dependencies.html#runtime-environments. Simply add them here: https://github.com/ray-project/kuberay/blob/master/ray-operator/apis/ray/v1/rayjob_types.go#L87.
You can use the runtime environment to install the dependencies: https://docs.ray.io/en/latest/ray-core/handling-dependencies.html#runtime-environments. Simply add them here: https://github.com/ray-project/kuberay/blob/master/ray-operator/apis/ray/v1/rayjob_types.go#L87.
Thx. we already used these features when run distribute training jobs.
Finally, I decided to use kubeRayjob to run distriube trainings, and use VolcanoJob or K8sNativeJob to run a single pod job.
When run in a single pod, I'm not sure If it's okay to run ray start --head && ray job submit
I'm not sure If it's okay to run ray start --head && ray job submit
ray job submit sends requests to the Ray dashboard. However, it still takes a while for the Ray dashboard to be ready for job requests after the ray start command returns.
when use RayCluster, it will cost additional 60s(about) to build a RayCluster before job run it's code.
If you only create a single Pod RayCluster, there should be no overhead compared to a single K8s Pod. The RayJob CRD works as follows:
- Create a RayCluster.
- Wait for the RayCluster to be "ready".
- Create a submitter K8s Job.
- Use ray job submit to submit the job to the Ray head node.
The overhead likely comes from step 3. We are currently working on a doc https://docs.google.com/document/d/1hCJsrCFYPJLS3Zusdr8N_4Y5leWUMy4bQEbsqSQp2mw/edit which can avoid the overhead of step 3. It is still WIP, but feel free to comment to give us feedback.
Closed this issue because https://docs.google.com/document/d/1hCJsrCFYPJLS3Zusdr8N_4Y5leWUMy4bQEbsqSQp2mw/edit?tab=t.0 has already been implemented.