kuberay
kuberay copied to clipboard
[Feature] Ray Serve CR and Controller
Search before asking
- [X] I had searched in the issues and found no similar feature requirement.
Description
We would like to contribute a controller embedded in kuberay the operate a Ray Serve application on top of kuberay cluster.
apiVersion: serve.ray.io/v1
kind: ServingCluster
metadata:
name: .
# status is populated by the operator, user can `kubectl get serve_deployments name | jq .medata.status` to receive the field.
status: UPDATING|HEALTHY|UNHEALTHY
spec:
healthCheckConfig: # optional
health_period_s: 5s
consecutive_failures_threshold: 3
serveConfig:
- deploymentClass: .
numReplicas: 2
rayActorOptions: .
rayClusterConfig:
apiVersion: cluster.ray.io/v1
kind: RayCluster
metadata:
generatedName: .
spec:
maxWorkers: 2
podTypes:
- name: head
rayResources: .
podConfig:
apiVersion: 1
kind: Pod
metadata:
generatedName: .
spec:
containers:
- name: ray-node
image: my_registry/container:v1
This operator performs health checks, initial and redeployment of Serve app on kuberay cluster, and rotate cluster if the Serve application fails. The CR will exposes health checking status of Serve application.
You can find more information from this design doc
Conceptually this is similar to SparkJob and FlinkJob in their respective operator. It is a high level concept built on top of existing CRs.
Comparing to the Ray Jobs controller/CR design, service CR is designed to be long running and should outlive cluster failure. However, both workload uses Ray's REST API endpoint to perform operation on the Ray cluster.
Use case
- Deploy Ray Serve application reliability on K8s cluster.
- Manage Serve application in a cloud native way
- Entrypoint to highly available application on Ray
Related issues
No response
Are you willing to submit a PR?
- [X] Yes I am willing to submit a PR!
Hi. We have finalized some discuss about the design of the new k8s operator for Serve Deployment and RayCluster management. Here is our design doc. We would like to hear the feedbacks from the committee to make the alignment. Also @simon-mo . One example thing we want to discuss is how to add this new operator, how should the repo package structure look like.
This has been done.