Kai-Hsun Chen

Results 327 comments of Kai-Hsun Chen
trafficstars

> At a quick glance, it seems that we create an ActiveExpectationItem for each Pod's creation, deletion, or update. I have some concerns about the scalability bottleneck caused by the...

@Eikykun, thank you for following up! Sorry for the late review. I had concerns about merging such a large change before Ray Summit. Now, I have enough time to verify...

cc @MortalHappiness can you also give this PR a pass of review?

Merged. Thank you for the contribution!

What is the relationship between this issue and KubeRay? It seems like a Ray Train issue.

Hi @abatilo, thank you for opening the issue. You may have some misunderstanding for GCS FT. You can read https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/kuberay-gcs-ft.html#kuberay-gcs-ft for more details. Currently, the only use case of GCS...

My current understanding is that Ray Train provides some degree of fault tolerance. * If a Ray worker Pod crashes, Ray Train will launch new Ray tasks or actors, allowing...

I will open a PR to change the `ray job submit` behavior. Currently, if we use the same submission ID for multiple `ray job submit` commands, only the first one...

https://github.com/ray-project/ray/pull/45498

There are some push backs from the Ray community for https://github.com/ray-project/ray/pull/45498. The KubeRay community has several possible solutions. Defer this to v1.3.0.