vllm
vllm copied to clipboard
[ROCm] Get rid of RAY_EXPERIMENTAL_NOSET_ROCR_VISIBLE_DEVICES
From v2.6.0, torch respects ROCR_VISIBLE_DEVICES on AMD GPU device discovery: https://github.com/pytorch/pytorch/pull/144026 So we no longer need always to set RAY_EXPERIMENTAL_NOSET_ROCR_VISIBLE_DEVICES, but we need to pop the ROCR_VISIBLE_DEVICES set by ray when we initialize RayWorkerWrapper, as we still use CUDA_VISIBLE_DEVICES for ROCm, setting both of them at the same time causes conflict.
In https://github.com/ray-project/ray/commit/3b9e729f6a669ffd85190f901f5e262af79771b0, Ray replaces AMD device env var with HIP_VISIBLE_DEVICES, so now we also need to pop the HIP_VISIBLE_DEVICES.
We also need to delay setting USE_ROWWISE_TORCH_SCALED_MM until the first time it is needed, as initially, when importing, CUDA_VISIBLE_DEVICES may not be set properly.
đ Hi! Thank you for contributing to the vLLM project.
đŦ Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.
Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.
To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.
đ
#15244 has been merged already, is this still relevant?
This PR also did more on the env var management.
Could you describe how you test this PR and the results?
The PR is tested via https://github.com/OpenRLHF/OpenRLHF/blob/fd81b5ed1d4212e9a34f545aae4955ab1c5a17e2/openrlhf/trainer/ray/vllm_engine.py, when RAY_EXPERIMENTAL_NOSET_ROCR_VISIBLE_DEVICES=1 is not set, and distributed_executor_backend is ray, with tensor_parallel_size = 2. Everything should work correctly with this PR now.
If we don't set RAY_EXPERIMENTAL_NOSET_ROCR_VISIBLE_DEVICES=1, without this PR, it will give out the following error:
Error executing method 'init_worker'. This might cause deadlock in distributed execution.
Traceback (most recent call last):
File "vllm/vllm/worker/worker_base.py", line 612, in execute_method
return run_method(self, method, args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "vllm/vllm/utils.py", line 2221, in run_method
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "site-packages/ray/util/tracing/tracing_helper.py", line 463, in _resume_span
return method(self, *_args, **_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "vllm/vllm/worker/worker_base.py", line 594, in init_worker
self.worker = worker_class(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "vllm/vllm/worker/worker.py", line 82, in __init__
self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
^^^^^^^^^^^^^^^^^
File "vllm/vllm/worker/model_runner.py", line 1062, in __init__
self.attn_backend = get_attn_backend(
^^^^^^^^^^^^^^^^^
File "vllm/vllm/attention/selector.py", line 95, in get_attn_backend
return _cached_get_attn_backend(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "vllm/vllm/attention/selector.py", line 154, in _cached_get_attn_backend
return resolve_obj_by_qualname(attention_cls)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "vllm/vllm/utils.py", line 1904, in resolve_obj_by_qualname
module = importlib.import_module(module_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 940, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "vllm/vllm/attention/backends/rocm_flash_attn.py", line 27, in <module>
_GPU_ARCH = torch.cuda.get_device_properties("cuda").gcnArchName
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "site-packages/torch/cuda/__init__.py", line 576, in get_device_properties
_lazy_init() # will define _get_device_properties
^^^^^^^^^^^^
File "site-packages/torch/cuda/__init__.py", line 372, in _lazy_init
torch._C._cuda_init()
RuntimeError: No HIP GPUs are available
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @HollowMan6.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @HollowMan6.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @HollowMan6.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @HollowMan6.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @HollowMan6.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @HollowMan6.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @HollowMan6.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @HollowMan6.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
cc @mgoin for discussions under https://github.com/vllm-project/vllm/pull/12695#discussion_r2478925809
/gemini review
@codex review
Codex Review: Didn't find any major issues. :tada:
âšī¸ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with đ.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".