Andrew Sy Kim
Andrew Sy Kim
From my previous testing of kuberay and Istio, it's not supported with kube-dns because kube-dns does not support Pod FQDN resolution for headless services. This is however supported with CoreDNS...
Yeah, it doesn't support the `.raycluster-istio-headless-svc.default.svc.cluster.local` that is referenced in https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/istio.html. I haven't tested if there are other viable workarounds
In the case where a Node is down for long enough and Kubernetes evicts the pod (i.e. deletes it), there should be some logic in KubeRay operator to recreate the...
> This issue is asking for the KubeRay operator to do the same thing as the replicaset controller. I suspect that if we ran the head pod as a single-replica...
> My suggestion would be to modify the operator code to create a new pod if the restart policy is Always and the pod enters Failed state. The comments [here](https://github.com/ray-project/kuberay/blob/master/ray-operator/controllers/ray/raycluster_controller.go#L838-L841)...
@xjhust can you help confirm this?
I would feel okay with removing the [isRestartPolicyAlways](https://github.com/ray-project/kuberay/blob/master/ray-operator/controllers/ray/raycluster_controller.go#L837) check. It seems like in most normal conditions the Pod phase never flips to `Failed`, so the deletion would never happen, but...
Yes, that's my understanding too, but I personally have not tested the pod status behavior during disk eviction, it'd be good to verify this behavior. Other than that, it seems...
> 1-pod replicaset What's the reason for a 1 pod replicaset instead of a statefulset?
@MortalHappiness what's the restart policy in your head pod? (it wasn't in the describe output) If it's `Always` and you validated pod state as `Failed` during disk pressure eviction, then...