kuberay
kuberay copied to clipboard
feat: annotate RayCluster with head node IP
Update RayCluster's .metadata.annotations.["ray.io/data"]
with
a JSON object with head node IP under the head-node-ip
key.
In a later commit we update the API server to parse the annotation and return the IP for each cluster.
Checks
- [x] I've made sure the tests are passing.
- Testing Strategy
- [x] Unit tests
- [ ] Manual tests
- [ ] This PR is not tested :(
Hi,
Not sure how the ApiServer is communicating with the head nodes, but after viewing this PR I thought that the headNode's service could be a safer choice instead of the headNode's IP. The Kubernetes service is resolvable automatically within the Kubernetes network domain and it should guarantee delivery to the pod since it is a resource managed by Kubernetes itself
I am not sure that adding the IP address of the headnode as an annotation is a good practice. Especially that IP address are ephemeral and can change. Plus the IP address is already part of the pod status
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2022-08-09T09:17:46Z"
status: "True"
type: Initialized
hostIP: 10.1.16.39
phase: Running
podIP: 10.244.12.74
podIPs:
- ip: 10.244.12.74
Why not as Christos mentioned use the service name. Plus if a component can read the annotations is there a reason why it cannot read the status as well and get the IP address.
Finally the IP is assigned at a later stage after the pod is created by the Ray Operator. So the Ray Operator might not be the best entity to read the IP and annotate the pod with it (unless we poll the API-Server to get the IP).
Can you please elaborate more on why this change is needed?
Thanks!
Thanks for reviewing. I just want the API to include the head IP when getting or listing clusters so that users don’t have to do it themselves which requires knowledge of K8s.
Should the API itself do all the work of getting the IP by looking for the head Pod instead of the controller? If so, is it OK that the API does this work each time it lists or gets clusters or should it save the IP somewhere?
regarding pod vs service IP, my use case is GKE and clients that will connect to the head pod from outside the K8s cluster. In this case the Service IP is not routable outside the cluster. We need to connect directly to the Pod IP. Any ideas how to accommodate this use case?
Any ideas how to accommodate this use case?
Can we make API return both Pod and Service IP?
@davidxia for the GKE case, when you need external access to the cluster, an alternative is to use a K8s Service
that is of type load-balancers or NodePort
. Exposing an individual pod IP address externally is a very unusual case. Of all the customers I worked with at IBM and Microsoft, we never had a single client exposing a pod externally directly.
You can do it through ingress, service or a cloud load balancer/gateway
.
I am not exactly sure what you mean by wanting the API to expose the IP address. Would something like kubectl get pod -owide -l=rayNodeType=head
work in your case, or do you need more?
I am not exactly sure what you mean by wanting the API to expose the IP address. Would something like kubectl get pod -owide -l=rayNodeType=head work in your case, or do you need more?
Yes that kubectl command is what I’m doing now. But I’m hoping kuberay can abstract away more of K8s implementation details from user. In particular I wanted the API endpoints like this https://github.com/ray-project/kuberay/tree/master/apiserver to return the head IP in cluster entities. Does that make sense?
Exposing an individual pod IP address externally is a very unusual case.
At Spotify we use a single GCP VPC and GKE clusters connected to that VPC. So all Pod IPs are routable from within the VPC including offices and VPN. We don’t use Service with load balancer because for a long time GCP didn’t support internal load balancers whose IPs are routable globally. They were either routable globally but exposed to public Internet or internal but regional. Because of how we set up our project permissions, creating Service with internal load balancers now for many RayClusters is difficult and also I’m not sure the KubeRay operator supports this. That’s why I’m hoping you’re open to adding both Service and Pod IP here. Otherwise, we can’t really use this feature even if it’s accepted.
happy to provide more context!
@davidxia I'm just going over the open PRs. Are you still interested potentially in merging this?
My two cents: To me, it seems reasonable to include the head pod's ip under the RayCluster CR's status field. Putting it in annotations I agree is a little odd.
cc @akanso @Jeffwan for thoughts
@DmitriGekhtman , I agree, the IP is a runtime generated value, and having it in the status field makes sense to me.
Great I can rework this to use the status field. How does the following look? Mind if I include both Pod and service IP for reasons described above?
status:
head:
podIP: string
serviceIP: string
That schema looks good to me.
@DmitriGekhtman I just updated to use .status.head
instead of annotation. Lmk if this draft looks OK and I'll polish it up and mark it ready for review by more people.
can we change the PR name, from annotate
to enrich cluster status with head pod ip?
Thanks for the initial review. Shall I go ahead and write tests? I also noticed the compatibility test fail. Is that expected?
Added unit tests. Ready for review.