Kai-Hsun Chen
Kai-Hsun Chen
> At a quick glance, it seems that we create an ActiveExpectationItem for each Pod's creation, deletion, or update. I have some concerns about the scalability bottleneck caused by the...
@Eikykun, thank you for following up! Sorry for the late review. I had concerns about merging such a large change before Ray Summit. Now, I have enough time to verify...
cc @MortalHappiness can you also give this PR a pass of review?
Merged. Thank you for the contribution!
What is the relationship between this issue and KubeRay? It seems like a Ray Train issue.
Hi @abatilo, thank you for opening the issue. You may have some misunderstanding for GCS FT. You can read https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/kuberay-gcs-ft.html#kuberay-gcs-ft for more details. Currently, the only use case of GCS...
My current understanding is that Ray Train provides some degree of fault tolerance. * If a Ray worker Pod crashes, Ray Train will launch new Ray tasks or actors, allowing...
I will open a PR to change the `ray job submit` behavior. Currently, if we use the same submission ID for multiple `ray job submit` commands, only the first one...
https://github.com/ray-project/ray/pull/45498
There are some push backs from the Ray community for https://github.com/ray-project/ray/pull/45498. The KubeRay community has several possible solutions. Defer this to v1.3.0.