DeepSpeed Installation instructions for ROCm absent

The installation instructions state that installing deepspeed is as simple as pip install deepspeed. However, this will install the nvidia version of pytorch.

If I install the ROCm pytorch with;

pip3 install torch torchvision --index-url https://download.pytorch.org/whl/rocm6.4

and then do

pip install deepseek

The command ds_report fails with the following error message;

[2025-09-19 13:31:26,997] [INFO] [real_accelerator.py:260:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Traceback (most recent call last):
  File "/home/user/git/openfold/.venv/bin/ds_report", line 3, in <module>
    from deepspeed.env_report import cli_main
  File "/home/user/git/openfold/.venv/lib/python3.12/site-packages/deepspeed/__init__.py", line 25, in <module>
    from . import ops
  File "/home/user/git/openfold/.venv/lib/python3.12/site-packages/deepspeed/ops/__init__.py", line 15, in <module>
    from ..git_version_info import compatible_ops as __compatible_ops__
  File "/home/user/git/openfold/.venv/lib/python3.12/site-packages/deepspeed/git_version_info.py", line 29, in <module>
    op_compatible = builder.is_compatible()
                    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/git/openfold/.venv/lib/python3.12/site-packages/deepspeed/ops/op_builder/fp_quantizer.py", line 35, in is_compatible
    sys_cuda_major, _ = installed_cuda_version()
                        ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/git/openfold/.venv/lib/python3.12/site-packages/deepspeed/ops/op_builder/builder.py", line 51, in installed_cuda_version
    raise MissingCUDAException("CUDA_HOME does not exist, unable to compile CUDA op(s)")
deepspeed.ops.op_builder.builder.MissingCUDAException: CUDA_HOME does not exist, unable to compile CUDA op(s)

Sep 19 '25 11:09 BSchilperoort

@BSchilperoort - we can work on getting updated instructions in the Readme.

The latest 0.17.5 release is broken for ROCm and we need to push a new update. I'll get to that shortly.

In the meantime, if you build with the latest you can apply this change and see if you're able to build locally?

https://github.com/deepspeedai/DeepSpeed/pull/7521

Sep 19 '25 15:09 loadams

git clone https://github.com/deepspeedai/DeepSpeed.git
cd DeepSpeed/
python3 -m venv .venv
source .venv/bin/activate
pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm6.4
pip install .
ds_report

Still returns the same error.

I don't see any instructions to build for rocm from source. I used the command from https://github.com/deepspeedai/DeepSpeed/issues/7565

pip install build
DS_ACCELERATOR=cuda LD_LIBRARY_PATH=/opt/rocm-6.4.3/lib PATH=$PATH:/opt/rocm-6.4.3/bin DS_BUILD_SPARSE_ATTN=0 NCCL_DEBUG=INFO DS_BUILD_OPS=1  DS_BUILD_STRING="+rocm" ./install.sh

Returns ModuleNotFoundError: No module named 'dskernels'

Sep 19 '25 18:09 BSchilperoort

Please see my comment https://github.com/deepspeedai/DeepSpeed/issues/7565#issuecomment-3306953101 for a working docker setup.

Sep 20 '25 17:09 maaaax

Please see my comment #7565 (comment) for a working docker setup.

Using that dockerfile (replacing the gpu arch with gfx1101) and then running this;

sudo docker run --rm -ti --device=/dev/kfd --device=/dev/dri --security-opt seccomp=unconfined --group-add video --ipc=host deepspeed bash
root@d779a3a67ae8:~# pip install deepspeed
root@d779a3a67ae8:~# ds_report

Makes ds_report pass, correctly detecting ROCm and the available GPU memory.

Now I need to get this to work outside of Docker, as I need to work on a package that uses openfold, which depends on deepspeed.

Sep 22 '25 08:09 BSchilperoort