kuberay
kuberay copied to clipboard
[Bug] HeadGroupSpec.Replicas is unused
Search before asking
- [X] I searched the issues and found no similar issues.
KubeRay Component
ray-operator
What happened + What you expected to happen
This bug is found by #605. When I updated the value of headGroupSpec.replicas from 1 to 2. It breaks this rule. The number of workers is determined by worker.Replicas (code), but HeadGroupSpec.Replicas is unused in KubeRay.
Possible solutions:
(1) Remove the HeadGroupSpec.Replicas field.
(2) Create head pods based on HeadGroupSpec.Replicas.
Reproduction script
- Updated the value of
headGroupSpec.replicasin your YAML file from 1 to 2.
Anything else
No response
Are you willing to submit a PR?
- [X] Yes I am willing to submit a PR!
cc @DmitriGekhtman
Thank you for tracking this! See also https://github.com/ray-project/kuberay/pull/362, which made the field optional.
I think we should remove the field in the next RayCluster API version.
So the solution is bullet (2) in your proposal. I believe I've already removed the field in all example docs and documented the field as deprecated in the raycluster_types.go.
Oh, actually I think I left it in the examples for compatibility with KubeRay 0.2.0. But now that we've released KubeRay 0.3.0, we can remove the field from all examples! @kevin85421 would you mind doing the honors of hunting down and deleting all instances of HeadGroupSpec.replicas in example yamls?
Thank @DmitriGekhtman for the reply! Why do not we remove HeadGroupSpec.Replicas from raycluster_types.go? I tried to remove the field from raycluster_types.go, and it can still keep backward compatibility.
# Step 1: Remove `HeadGroupSpec.Replicas` from `raycluster_types.go`
# Step 2: Update CRD / generated API
make sync
# Step 3: zz_generated_deepcopy.go
make generate
# Step4: Build docker image
make docker-image
# Step 5: Install new KubeRay operator (controller:latest) and new CRD
# Step 6: Install RayCluster with `HeadGroupSpec.Replicas`
# Work well.
We have an issue to track it and we need to do some clean up on the api.
Related issue: #368