kuberay icon indicating copy to clipboard operation
kuberay copied to clipboard

[Feature] Configurable RayCluster readiness definition

Open skliarpawlo opened this issue 2 years ago • 2 comments
trafficstars

Search before asking

  • [X] I had searched in the issues and found no similar feature requirement.

Description

Currently RayCluster resource is considered ready when it's created in Kubernetes. Would be great to have an option to consider it ready when a head node is ready and min_replicas count is achieved for each worker group.

Use case

In any automated pipeline we should first create a cluster and then send a payload. This involves step in between when we need to wait until cluster is ready to get payload.

Related issues

No response

Are you willing to submit a PR?

  • [ ] Yes I am willing to submit a PR!

skliarpawlo avatar Nov 06 '23 18:11 skliarpawlo

https://github.com/ray-project/kuberay/issues/533

kevin85421 avatar Jun 29 '24 23:06 kevin85421

Chatted with @rueian today. Currently, we redefine "ready" with a new RayCluster condition called RayClusterReady. This condition indicates whether all Ray Pods are ready when the RayCluster is first created. After RayClusterReady is set to true for the first time, it only indicates whether the RayCluster's head Pod is ready for requests. The definition of "ready" in the first stage is somewhat "configurable", while in the second stage, it is controlled by the Ray Autoscaler.

If the new definition doesn't work well, we will add a new field for each worker group in CRD to enable users to explicitly define the definition of "ready".

kevin85421 avatar Jul 31 '24 16:07 kevin85421