ray
ray copied to clipboard
[Core] Make `ray.get_gpu_ids()` always return `List[int]`
Previously, ray.get_gpu_ids()
would sometimes return a List of ints and sometimes return a List of strings depending on if the user has set the CUDA_VISIBLE_DEVICES
env var, which made the API difficult to program against.
Instead, ray.get_gpu_ids()
should always return a List of ints.
Why are these changes needed?
Related issue number
Closes https://github.com/ray-project/ray/issues/28467
Checks
- [ ] I've signed off every commit(by using the -s flag, i.e.,
git commit -s
) in this PR. - [ ] I've run
scripts/format.sh
to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
- [ ] Unit tests
- [ ] Release tests
- [ ] This PR is not tested :(
Nice work!
I think the question we need to answer is do we want to maintain bug compatibility. cc @pcmoritz
I don't know the full history, but can't the cuda devices be hash identified instead of index identified? That would require a string and not an int.
Ah yes, that's right- it can also be UUIDs.
GPU identifiers are given as integer indices or as UUID strings.
From https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars.
In this case, then we should always return strings and not ints? The current behavior of sometimes strings and sometimes ints based on if the environment variable is set is a bit finnicky.
Would +1 always strings, we can explain why in the doc.
@pcmoritz thoughts about the proposed change to always return strings?
I think we should do this: Leave this function alone and introduce a new one with the new behavior. Print a deprecation warning that suggests to use the new function and tells the user what the difference is and how to transition to the new behavior (and link to the docs describing why we changed this). After a number of releases remove the old function (but let's not rush this, I'm more thinking 6 months to a year time frame).
The new function could be ray.get_runtime_context().get_gpu_ids()
. I believe it fits well with the rest of runtime context.
I do believe we need to always return strings in the new function.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
- If you'd like to keep this open, just leave any comment, and the stale label will be removed.
Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.
Please feel free to reopen or open a new issue if you'd still like it to be addressed.
Again, you can always ask for help on our discussion forum or Ray's public slack channel.
Thanks again for opening the issue!