ray icon indicating copy to clipboard operation
ray copied to clipboard

[Core] Make `ray.get_gpu_ids()` always return `List[int]`

Open amogkam opened this issue 2 years ago • 7 comments

Previously, ray.get_gpu_ids() would sometimes return a List of ints and sometimes return a List of strings depending on if the user has set the CUDA_VISIBLE_DEVICES env var, which made the API difficult to program against.

Instead, ray.get_gpu_ids() should always return a List of ints.

Why are these changes needed?

Related issue number

Closes https://github.com/ray-project/ray/issues/28467

Checks

  • [ ] I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • [ ] I've run scripts/format.sh to lint the changes in this PR.
  • [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
  • [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • [ ] Unit tests
    • [ ] Release tests
    • [ ] This PR is not tested :(

amogkam avatar Sep 20 '22 03:09 amogkam

Nice work!

jovany-wang avatar Sep 20 '22 04:09 jovany-wang

I think the question we need to answer is do we want to maintain bug compatibility. cc @pcmoritz

jjyao avatar Sep 20 '22 23:09 jjyao

I don't know the full history, but can't the cuda devices be hash identified instead of index identified? That would require a string and not an int.

ericl avatar Sep 21 '22 03:09 ericl

Ah yes, that's right- it can also be UUIDs.

GPU identifiers are given as integer indices or as UUID strings.

From https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars.

In this case, then we should always return strings and not ints? The current behavior of sometimes strings and sometimes ints based on if the environment variable is set is a bit finnicky.

amogkam avatar Sep 21 '22 04:09 amogkam

Would +1 always strings, we can explain why in the doc.

ericl avatar Sep 21 '22 04:09 ericl

@pcmoritz thoughts about the proposed change to always return strings?

amogkam avatar Sep 21 '22 04:09 amogkam

I think we should do this: Leave this function alone and introduce a new one with the new behavior. Print a deprecation warning that suggests to use the new function and tells the user what the difference is and how to transition to the new behavior (and link to the docs describing why we changed this). After a number of releases remove the old function (but let's not rush this, I'm more thinking 6 months to a year time frame).

The new function could be ray.get_runtime_context().get_gpu_ids(). I believe it fits well with the rest of runtime context.

I do believe we need to always return strings in the new function.

pcmoritz avatar Sep 21 '22 04:09 pcmoritz

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

  • If you'd like to keep this open, just leave any comment, and the stale label will be removed.

stale[bot] avatar Nov 07 '22 06:11 stale[bot]

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!

stale[bot] avatar Nov 22 '22 21:11 stale[bot]