Andrew Sy Kim

Results 214 comments of Andrew Sy Kim

From my previous testing of kuberay and Istio, it's not supported with kube-dns because kube-dns does not support Pod FQDN resolution for headless services. This is however supported with CoreDNS...

Yeah, it doesn't support the `.raycluster-istio-headless-svc.default.svc.cluster.local` that is referenced in https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/istio.html. I haven't tested if there are other viable workarounds

In the case where a Node is down for long enough and Kubernetes evicts the pod (i.e. deletes it), there should be some logic in KubeRay operator to recreate the...

> This issue is asking for the KubeRay operator to do the same thing as the replicaset controller. I suspect that if we ran the head pod as a single-replica...

> My suggestion would be to modify the operator code to create a new pod if the restart policy is Always and the pod enters Failed state. The comments [here](https://github.com/ray-project/kuberay/blob/master/ray-operator/controllers/ray/raycluster_controller.go#L838-L841)...

I would feel okay with removing the [isRestartPolicyAlways](https://github.com/ray-project/kuberay/blob/master/ray-operator/controllers/ray/raycluster_controller.go#L837) check. It seems like in most normal conditions the Pod phase never flips to `Failed`, so the deletion would never happen, but...

Yes, that's my understanding too, but I personally have not tested the pod status behavior during disk eviction, it'd be good to verify this behavior. Other than that, it seems...

@MortalHappiness what's the restart policy in your head pod? (it wasn't in the describe output) If it's `Always` and you validated pod state as `Failed` during disk pressure eviction, then...