Kai-Hsun Chen
Kai-Hsun Chen
> @kevin85421 @DmitriGekhtman > > Openshift uses GID 0 by default so definitely not 100. Yes 777 is the right way to support all Kubernetes distributions. Everything in your container...
777 is too open for ssh, so #31563 may be reverted. See #32025 for more details. Any ideas for other solutions? cc @juliusvonkohout @ijrsvt
We decided to integrate Kubeflow without this update. (https://github.com/kubeflow/manifests/pull/2383)
After the first RayJob succeeded, I deleted it and created a new one. The new RayJob starts from the newest checkpoint successfully.
@sjberman Thank you! You guys are so nice.
Hi @kate-osborn, I have already read #717. Is there anyone working on this issue? If not, could I take it? Thanks!
I had a meeting with @sfrolich this afternoon to reproduce the issue. The root cause of the issue is: KubeRay v1.1.0 introduces a new annotation `ray.io/num-worker-groups`. Therefore, if a RayCluster...
The root cause is that v1.2.0 makes the head service headless by default. However, the K8s API server will throw an exception if we attempt to change a running ClusterIP...
open a PR: https://github.com/ray-project/kuberay/pull/2343
@sfrolich I will release Helm charts for 1.2.1 to disable the headless service by default, but I will not build images for KubeRay v1.2.1.