kuberay
kuberay copied to clipboard
[Bug] ray service example will have worker that name is unexpectedly trimmed
Search before asking
- [X] I searched the issues and found no similar issues.
KubeRay Component
Others
What happened + What you expected to happen
in the example: https://github.com/ray-project/kuberay/blob/9bb690ec910d13068216a14e1c4aed378a36392d/ray-operator/config/samples/ray_v1alpha1_rayservice.yaml#L4 this may lead to a too-long name problem for worker so in operator therefore it would be trimmed.
NAME READY STATUS RESTARTS AGE
ervice-sample-raycluster-fs6fl-worker-small-group-25lpg 1/1 Running 0 48m
rayservice-sample-raycluster-fs6fl-head-vmqmz 1/1 Running 0 48m
it would be wired for the names and we expect it to be the correct name prefix just like head pod
Reproduction script
just apply the example in ray-operator/config/sample/ray_v1alpha1_rayservice.yaml
Anything else
No response
Are you willing to submit a PR?
- [X] Yes I am willing to submit a PR!
I ran into the same issue. btw, are we able to make the name deterministic? Now
- Head node name is
<cluster_name>-raycluster-<random_string>-head-<random_string> - Worker name is
<cluster_name>-raycluster-<random_string>-worker-small-group-<random_string>
Could we remove the random strings from the name? IIUC, <cluster_name> is already unique, so we don't need to add random strings to it, right? I think the name could be
- Head Node:
<cluster_name>-raycluster-head. - Worker Node:
<cluster_name>-raycluster-worker-small-group-<0/1/2>
In Flyte, we use the RayJob name to construct a Kubernetes dashboard URL (http://flyte:30082/#/log/{{ .namespace }}/{{ .RayJobName+"-head" }}/pod?namespace={{ .namespace }}). If the pod name is non-deterministic, we should send another request to the API server to get the pod name.
If we have to keep the name non-deterministic, could we add the name of the head node to Raycluster.status?
cc @Jeffwan
Per @scarlet25151's question, the major reason name changed is because of logic here
https://github.com/ray-project/kuberay/blob/0c98cf2797e6e0e96fbc9889509f622373f7d89f/ray-operator/controllers/ray/utils/util.go#L49-L72
DNS-1123 subdomain allows up to 253 char. @akanso Do you know why it's controlled under 64? This seems a very restricted setting. Can we update it and match with kubernetes setting?
For the enhancement @pingsutw suggests, I think
- If the non-deterministic cause any failures now, then we should definitely fix that. The underneath reason is we don't want to manage worker in stateful way, stateless makes more sense we don't need to care about the sequence no. if we use exact same name for head, then the new one has to be created after the old one is totally removed. There would be cases in the future to support HA (multi-heads), so that's why it uses random strings for both head and worker now.
- Add head pod name to
RayCluster.statusis reasonable for such cases, we can make it. If you have time, feel free to cut a PR.
BTW, how do you get application logs from RayJob now? http://flyte:30082/#/log/{{ .namespace }}/{{ .RayJobName+"-head" }}/pod?namespace={{ .namespace }} what's the underneath logic?
As mentioned by @Jeffwan, this is not a bug. Close this issue.