kuberay
kuberay copied to clipboard
[Feature] Support Istio
Search before asking
- [X] I had searched in the issues and found no similar feature requirement.
Description
Support Istio with KubeRay
Use case
https://github.com/kubeflow/manifests/pull/2383
Related issues
#948
Are you willing to submit a PR?
- [X] Yes I am willing to submit a PR!
@kevin85421 are there any internal plans to support this?
cc @Yicheng-Lu-llll
@kevin85421 if this supports then could it be possible to submit the jobs using external DNS ? I have tried but no luck.
I have updated my query here : https://discuss.ray.io/t/need-assistance-to-submit-job-using-external-dns/13354
Hi @kevin85421,
After digging into this for a couple of days, I have managed to make Ray work with Istio on three different levels with some modifications in KubeRay.
Level 1: Plain TCP
This is the basic level of Istio integration that doesn't do any traffic encryption and should work out of the box with the current KubeRay.
One thing to note is that, with the latest Istio 1.21.0, users MUST apply the following Istio config to let the Istio sidecar proxy the plain grpc traffic correctly, or the Ray worker node won't be able to start.
meshConfig:
defaultConfig:
runtimeValues:
envoy.reloadable_features.sanitize_te: "false"
There is an issue opened for this https://github.com/istio/istio/issues/49685
Level 2: mTLS
This is the second level integration that Istio must learn L4 information about Ray to do traffic encryption. That is when a TCP packet goes in or out of an Istio sidecar, Istio must be able to know where it does come from or go from its source or destination ip+port tuple. And, In Ray, a pod will connect to another pod directly without the help of a virtual IP like a normal k8s service. So, we must apply a headless service to all Ray pods to let Istio know these L4 information correctly, for example:
apiVersion: v1
kind: Service
metadata:
labels:
ray.io/headless-worker-svc: raycluster-mini
name: raycluster-mini-headless-svc
namespace: default
spec:
clusterIP: None
selector:
ray.io/cluster: raycluster-mini
ports:
- name: node-manager-port
port: 6380
- name: object-manager-port
port: 6381
- name: runtime-env-agent-port
port: 6382
- name: dashboard-agent-grpc-port
port: 6383
- name: dashboard-agent-listen-port
port: 52365
- name: metrics-export-port
port: 8080
- name: p10002
port: 10002
- name: p10003
port: 10003
- name: p10004
port: 10004
- name: p10005
port: 10005
...
All the ports listed in https://docs.ray.io/en/latest/ray-core/configure.html#all-nodes, including worker ports, MUST be explicitly configured in the rayStartParams and carried to the headless service.
Furthermore, if the Istio is operated in the STRICT mode, there are two more changes should be done on the KubeRay side
- Disable the
wait-gcs-readyinit container by setting the envENABLE_INIT_CONTAINER_INJECTION=false. This init container will not be able to connect to the Ray GCS in Istio STRICT mode and will never be finished because it starts before the Istio sidecar. - KubeRay controller talks to the underlying RayCluster of a RayService and RayJob directly. In the case of Istio STRICT mode, the controller should also have an Istio sidecar deployed or it should talk to the RayCluster through the kubeapi proxy.
Level 3: L7 Visibilities
This is the strictest level of integration that Istio knows L7 details about Ray's internal traffic. We not only need to set appProtocol correctly in the above headless service but also let Ray use FQDN hostname to communicate with each other internally because Istio uses the Host/Authority header for L7 routing. That is, setting the --node-ip-address to every Ray node is needed.
Here is a quick and dirty hack to the KubeRay to enable mTLS+L7 Visibilities for a RayCluster named raycluster-mini:
apiVersion: ray.io/v1
kind: RayCluster
metadata:
name: raycluster-mini
spec:
rayVersion: '2.9.0'
headGroupSpec:
rayStartParams:
num-cpus: '1'
node-manager-port: '6380'
object-manager-port: '6381'
runtime-env-agent-port: '6382'
dashboard-agent-grpc-port: '6383'
dashboard-grpc-port: '6384'
#pod template
template:
spec:
containers:
- name: ray-head
image: rayproject/ray:2.9.0
ports:
- containerPort: 6379
name: redis
- containerPort: 8265 # Ray dashboard
name: dashboard
- containerPort: 10001
name: client
workerGroupSpecs:
- replicas: 1
minReplicas: 1
maxReplicas: 1
groupName: small-group
rayStartParams:
num-cpus: '1'
node-manager-port: '6380'
object-manager-port: '6381'
runtime-env-agent-port: '6382'
dashboard-agent-grpc-port: '6383'
#pod template
template:
spec:
containers:
- name: ray-worker
image: rayproject/ray:2.9.0
The hack:
diff --git a/ray-operator/controllers/ray/common/pod.go b/ray-operator/controllers/ray/common/pod.go
index 0182ead..b79f7e5 100644
--- a/ray-operator/controllers/ray/common/pod.go
+++ b/ray-operator/controllers/ray/common/pod.go
@@ -339,6 +339,7 @@ func BuildPod(ctx context.Context, podTemplateSpec corev1.PodTemplateSpec, rayNo
ulimitCmd := "ulimit -n 65536"
// Generate the `ray start` command.
rayStartCmd := generateRayStartCommand(ctx, rayNodeType, rayStartParams, pod.Spec.Containers[utils.RayContainerIndex].Resources)
+ rayStartCmd += ` --node-ip-address $(echo "$POD_IP" | sed 's/\./-/g').raycluster-mini-headless-svc.default.svc.cluster.local`
// Check if overwrites the generated container command or not.
isOverwriteRayContainerCmd := false
@@ -580,7 +581,14 @@ func setContainerEnvVars(pod *corev1.Pod, rayNodeType rayv1.RayNodeType, rayStar
},
},
}
- container.Env = append(container.Env, rayCloudInstanceID)
+ container.Env = append(container.Env, rayCloudInstanceID, corev1.EnvVar{
+ Name: "POD_IP",
+ ValueFrom: &corev1.EnvVarSource{
+ FieldRef: &corev1.ObjectFieldSelector{
+ FieldPath: "status.podIP",
+ },
+ },
+ })
// RAY_NODE_TYPE_NAME is used by Ray Autoscaler V2 (alpha). See https://github.com/ray-project/kuberay/issues/1965 for more details.
nodeGroupNameEnv := corev1.EnvVar{
diff --git a/ray-operator/controllers/ray/common/service.go b/ray-operator/controllers/ray/common/service.go
index 7ffb8f5..becc958 100644
--- a/ray-operator/controllers/ray/common/service.go
+++ b/ray-operator/controllers/ray/common/service.go
@@ -3,7 +3,9 @@ package common
import (
"context"
"fmt"
+ "k8s.io/utils/pointer"
"sort"
+ "strings"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
@@ -68,6 +70,15 @@ func BuildServiceForHeadPod(ctx context.Context, cluster rayv1.RayCluster, label
ports := []corev1.ServicePort{}
for name, port := range ports_int {
svcPort := corev1.ServicePort{Name: name, Port: port, AppProtocol: &defaultAppProtocol}
+ switch port {
+ case 6379, 10001:
+ svcPort.AppProtocol = pointer.String("grpc")
+ case 8265:
+ svcPort.AppProtocol = pointer.String("http")
+ }
+ if strings.Contains(name, "grpc") {
+ svcPort.AppProtocol = pointer.String("grpc")
+ }
ports = append(ports, svcPort)
}
if cluster.Spec.HeadGroupSpec.HeadService != nil {
@@ -285,8 +296,7 @@ func BuildHeadlessServiceForRayCluster(rayCluster rayv1.RayCluster) (*corev1.Ser
utils.RayClusterHeadlessServiceLabelKey: rayCluster.Name,
}
selectorLabels := map[string]string{
- utils.RayClusterLabelKey: rayCluster.Name,
- utils.RayNodeTypeLabelKey: string(rayv1.WorkerNode),
+ utils.RayClusterLabelKey: rayCluster.Name, // Apply headless service to all Ray nodes.
}
headlessService := &corev1.Service{
@@ -296,11 +306,51 @@ func BuildHeadlessServiceForRayCluster(rayCluster rayv1.RayCluster) (*corev1.Ser
Labels: labels,
},
Spec: corev1.ServiceSpec{
- ClusterIP: "None",
- Selector: selectorLabels,
- Type: corev1.ServiceTypeClusterIP,
+ ClusterIP: "None",
+ Selector: selectorLabels,
+ Type: corev1.ServiceTypeClusterIP,
+ PublishNotReadyAddresses: true, // Ray head will not be able to start without this.
+ Ports: []corev1.ServicePort{
+ { // All the following port definitions, including worker ports, should be parsed from the rayStartParams
+ Name: "node-manager-port",
+ AppProtocol: pointer.String("grpc"),
+ Port: 6380,
+ },
+ {
+ Name: "object-manager-port",
+ AppProtocol: pointer.String("grpc"),
+ Port: 6381,
+ },
+ {
+ Name: "runtime-env-agent-port",
+ AppProtocol: pointer.String("grpc"),
+ Port: 6382,
+ },
+ {
+ Name: "dashboard-agent-grpc-port",
+ AppProtocol: pointer.String("grpc"),
+ Port: 6383,
+ },
+ {
+ Name: "dashboard-agent-listen-port",
+ AppProtocol: pointer.String("http"),
+ Port: 52365,
+ },
+ {
+ Name: "metrics-export-port",
+ AppProtocol: pointer.String("http"),
+ Port: 8080,
+ },
+ },
},
}
+ for i := int32(10002); i < 10100; i++ {
+ headlessService.Spec.Ports = append(headlessService.Spec.Ports, corev1.ServicePort{
+ Name: fmt.Sprintf("p%d", i),
+ AppProtocol: pointer.String("grpc"),
+ Port: i,
+ })
+ }
return headlessService, nil
}
diff --git a/ray-operator/controllers/ray/raycluster_controller.go b/ray-operator/controllers/ray/raycluster_controller.go
index 63b584c..3d51cec 100644
--- a/ray-operator/controllers/ray/raycluster_controller.go
+++ b/ray-operator/controllers/ray/raycluster_controller.go
@@ -593,6 +593,7 @@ func (r *RayClusterReconciler) reconcileHeadlessService(ctx context.Context, ins
}
}
+ isMultiHost = true // Always create the headless service
if isMultiHost {
services := corev1.ServiceList{}
options := common.RayClusterHeadlessServiceListOptions(instance)
diff --git a/ray-operator/controllers/ray/utils/constant.go b/ray-operator/controllers/ray/utils/constant.go
index 9bad224..0302324 100644
--- a/ray-operator/controllers/ray/utils/constant.go
+++ b/ray-operator/controllers/ray/utils/constant.go
@@ -85,8 +85,8 @@ const (
ComponentName = "kuberay-operator"
// The default suffix for Headless Service for multi-host worker groups.
- // The full name will be of the form "${RayCluster_Name}-headless-worker-svc".
- HeadlessServiceSuffix = "headless-worker-svc"
+ // The full name will be of the form "${RayCluster_Name}-headless-svc".
+ HeadlessServiceSuffix = "headless-svc"
// Use as container env variable
RAY_CLUSTER_NAME = "RAY_CLUSTER_NAME"
Summary
The above hacky patch made a couple of changes to KubeRay to enable mTLS+L7 visibilities with Istio:
- Extend the current headless service for TPU Multi Host to all Ray nodes. This generates FQDN for each Ray node.
- Hard code all port definitions to the headless service for this demo purpose. This fulfills the criteria of auto mTLS.
- Specify
--node-ip-address FQDNto all Ray nodes. This, along with the correctappProtocol, fulfills the criteria of L7 visibilities.
Related issues: https://github.com/ray-project/kuberay/issues/948 https://github.com/ray-project/kuberay/issues/1025 https://github.com/ray-project/kuberay/issues/1946
Discussions
Although the above patch doesn't change too many lines, it brings some critical issues:
- All the ports must be explicitly stated in the rayStartParams but the default worker port range is too large. I have tried to state all the default 10002~19999 ports in my local Kind cluster and they caused all pods to be killed by OOM. We probably should document this memory issue and recommend users use a smaller worker ports range if they want to enable auto mTLS.
- The
--node-ip-address FQDNoption may bring a compatibility issue: The old KubeDNS uses<pod-ip>.<headless-svc>.<ns>.cluster.local, but the new spec uses<pod-hostname>.<headless-svc>.<ns>.cluster.local. The CoreDNS keeps the former format for backward compatibility, but other implementations may only accept the latter one.
Thank you @rueian i hope that https://github.com/istio/istio/issues/49685 gets fixed soon then.
Relevant KEP in Kubernetes: https://github.com/kubernetes/enhancements/pull/2611
There were some previous discussions around support port ranges as well, but I can't seem to find that discussion
https://github.com/ray-project/ray/pull/44801 Ruei-An has already added a doc for Istio. Close this issue.
@rueian Do these changes need a specific version of kube-dns ?
From my previous testing of kuberay and Istio, it's not supported with kube-dns because kube-dns does not support Pod FQDN resolution for headless services. This is however supported with CoreDNS and most other DNS providers. @rueian can correct me if I'm wrong
@andrewsykim Is it because the format in kube-dns supports only hostname and not pod-ip in FQDN?
Yeah, it doesn't support the <POD IP ADDRESS>.raycluster-istio-headless-svc.default.svc.cluster.local that is referenced in https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/istio.html.
I haven't tested if there are other viable workarounds
Hi @andrewsykim and @t-indumathy,
I find a note regarding hostname A records for Pod here https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-hostname-and-subdomain-fields:
So the current workaround, if your kube-dns doesn't support the old Pod IP FQDN, is setting the hostname and subdomain fields on Pods manually to generate hostname A records.
This workaround is quite awkward since we can have only ONE replica in a worker group in order to set Pod hostnames manually. Here is a modified example from the KubeRay Istio tutorial that applies the workaround in the headGroupSpec and the workerGroupSpecs.
kubectl apply -f - <<EOF
apiVersion: ray.io/v1
kind: RayCluster
metadata:
name: raycluster-istio
spec:
rayVersion: '2.35.0'
headGroupSpec:
rayStartParams:
num-cpus: '1'
node-manager-port: '6380'
object-manager-port: '6381'
runtime-env-agent-port: '6382'
dashboard-agent-grpc-port: '6383'
dashboard-agent-listen-port: '52365'
metrics-export-port: '8080'
max-worker-port: '10012'
node-ip-address: $(hostname --fqdn)
template:
spec:
hostname: head
subdomain: raycluster-istio-headless-svc
containers:
- name: ray-head
image: rayproject/ray:2.35.0
workerGroupSpecs:
- replicas: 1
minReplicas: 1
maxReplicas: 1
groupName: small-group
rayStartParams:
num-cpus: '1'
node-manager-port: '6380'
object-manager-port: '6381'
runtime-env-agent-port: '6382'
dashboard-agent-grpc-port: '6383'
dashboard-agent-listen-port: '52365'
metrics-export-port: '8080'
max-worker-port: '10012'
node-ip-address: $(hostname --fqdn)
template:
spec:
hostname: worker-1
subdomain: raycluster-istio-headless-svc
containers:
- name: ray-worker
image: rayproject/ray:2.35.0
EOF
Thanks @rueian ! Will test this.