Kai-Hsun Chen comments

Results 306 comments of


                                            Kai-Hsun Chen

trafficstars

[Bug] Containers exit with OOM and Error in ray-cluster.autoscaler.yaml

[Possible Solution] 1. Update [ray-operator/Dockerfile](https://github.com/ray-project/kuberay/blob/master/ray-operator/Dockerfile#L18) to `RUN CGO_ENABLED=0 GOOS=linux GOARCH=arm64 GO111MODULE=on go build -a -o manager main.go` 2. Build a multi-architecture image for Kuberay with [docker/buildx](https://github.com/docker/buildx). 3. Build a multi-architecture...

[Feature] [Helm] Work out versioning and release story for Helm charts

See #557 for more details.

[Feature] [Helm] Work out versioning and release story for Helm charts

TODO: Host stable charts in a separate repo.

[Feature] Support go1.18+

cc @DmitriGekhtman

[logger] sync to file

@Jeffwan is this PR ready to merge? The merge is blocked by your change requests. Thank you!

[Bug] KubeRay operator periodically crashes retrieving resource lock

The following links may be useful. * https://support.hashicorp.com/hc/en-us/articles/4404634420755-Why-am-I-seeing-context-deadline-exceeded-errors * https://stackoverflow.com/questions/75148975/leaderelections-failing-lease-unable-to-be-renewed-automatically * https://discuss.kubernetes.io/t/kubeadm-init-fails-kube-scheduler-fails-with-error-retrieving-resource-lock-kube-system-kube-scheduler-context-deadline-exceeded-client-timeout-exceeded-while-awaiting-headers/24389/2 Would you mind conducting two experiments: * Experiment 1: Increase the memory limit/request for the KubeRay operator Pod....

Kai-Hsun Chen

[Bug] Containers exit with OOM and Error in ray-cluster.autoscaler.yaml

[Feature] [Helm] Work out versioning and release story for Helm charts

[Feature] [Helm] Work out versioning and release story for Helm charts

[Feature] Support go1.18+

[logger] sync to file

[Bug] KubeRay operator periodically crashes retrieving resource lock

[Bug] Ray cluster terminates more worker pods than the amount of replica scale down requested

[Bug] GKE CSI Fuse Mounts prevent worker pod creation

[Bug] GKE CSI Fuse Mounts prevent worker pod creation

[Feature] Finalizer to block deletion of RayCluster with running jobs