Kai-Hsun Chen

Results 323 comments of Kai-Hsun Chen
trafficstars

> If I'm understanding correctly I'd still have to make changes to min_replicas in a config file and redeploy --- but that's already possible by setting min_workers in my cluster...

Hi @guox18, would you mind sharing more details about properly setting CUDA_VISIBLE_DEVICES? For example, how many actors do you have, and how do they map to GPU devices?

@PeterSH6, would you mind adding a "ray" label to this issue? I am triaging Ray-related issues in veRL. Thanks!

Hi @guox18, I don't know how does veRL launch a Ray cluster, but there are two methods: * Method 1: `os.environ["CUDA_VISIBLE_DEVICES"] = "1,2,3"` ```python import os import ray import torch...

cc @pradeep-anyscale The KEP to expose topology information

@zhaochenyang20 maybe you want to ping @KivenChen? 😆

cc @andrewsykim Would you mind commenting on this issue? Without your comment, I am unable to assign the issue to you. Thanks!

> Actually, one thing to consider here is that users may not want to use emptyDir for the ray logs. They may want to persist them with remote storage or...

I checked with the Ray team. 1. Ray logs are not always stored in `/tmp/ray`. It is configurable by setting `_temp_dir`. See [this doc](https://docs.ray.io/en/latest/_modules/ray/_private/worker.html#) for more details. 2. Most users...

> This means slim Ray image will only work with new versions of Kuberay but that's OK. what's the image tag? Does `ray-project/ray:X.Y.Z` still have `wget`?