Do I need an AMD graphics card to compile the ROCM version of Deepspeed?
I have successfully compiled the Linux CU129 version of DeepSpeed and have not used an Nvidia graphics card Then I tried to build a deep speed version of the rocm, but it failed I built it in GitHub Actions, so I'm not sure if it's a configuration issue or if an AMD graphics card is necessary error
Attempting to remove deepspeed/git_version_info_installed.py
Attempting to remove dist
Attempting to remove build
Attempting to remove deepspeed.egg-info
No hostfile exists at /job/hostfile, installing locally
Building deepspeed wheel
* Getting build dependencies for wheel...
[2025-09-16 09:51:10,895] [INFO] [real_accelerator.py:260:get_accelerator] Setting ds_accelerator to cuda (override)
[2025-09-16 09:51:11,133] [INFO] [real_accelerator.py:260:get_accelerator] Setting ds_accelerator to cuda (override)
[2025-09-16 09:51:14,953] [INFO] [logging.py:107:log_dist] [Rank -1] [TorchCheckpointEngine] Initialized with serialization = False
Traceback (most recent call last):
DS_BUILD_OPS=1
File "/home/runner/work/actions-builder/actions-builder/.venv/lib/python3.10/site-packages/pyproject_hooks/_in_process/_in_process.py", line 389, in <module>
main()
File "/home/runner/work/actions-builder/actions-builder/.venv/lib/python3.10/site-packages/pyproject_hooks/_in_process/_in_process.py", line 373, in main
json_out["return_val"] = hook(**hook_input["kwargs"])
File "/home/runner/work/actions-builder/actions-builder/.venv/lib/python3.10/site-packages/pyproject_hooks/_in_process/_in_process.py", line 143, in get_requires_for_build_wheel
return hook(config_settings)
File "/home/runner/work/actions-builder/actions-builder/.venv/lib/python3.10/site-packages/setuptools/build_meta.py", line 334, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=[])
File "/home/runner/work/actions-builder/actions-builder/.venv/lib/python3.10/site-packages/setuptools/build_meta.py", line 304, in _get_build_requires
self.run_setup()
File "/home/runner/work/actions-builder/actions-builder/.venv/lib/python3.10/site-packages/setuptools/build_meta.py", line 522, in run_setup
super().run_setup(setup_script=setup_script)
File "/home/runner/work/actions-builder/actions-builder/.venv/lib/python3.10/site-packages/setuptools/build_meta.py", line 320, in run_setup
exec(code, locals())
File "<string>", line 201, in <module>
File "/home/runner/work/actions-builder/actions-builder/op_builder/builder.py", line 709, in builder
{'cxx': self.strip_empty_entries(self.cxx_args()), \
File "/home/runner/work/actions-builder/actions-builder/op_builder/builder.py", line 401, in strip_empty_entries
return [x for x in args if len(x) > 0]
File "/home/runner/work/actions-builder/actions-builder/op_builder/builder.py", line 401, in <listcomp>
return [x for x in args if len(x) > 0]
TypeError: object of type 'NoneType' has no len()
ERROR Backend subprocess exited when trying to invoke get_requires_for_build_wheel
Error on line 155
Fail to install deepspeed
runner
- run: |
uname -m && cat /etc/*release
uname -srmv
wget https://repo.radeon.com/amdgpu-install/6.4.3/ubuntu/jammy/amdgpu-install_6.4.60403-1_all.deb
sudo apt install ./amdgpu-install_6.4.60403-1_all.deb
sudo apt update
sudo apt install python3-setuptools python3-wheel
sudo usermod -a -G render,video $LOGNAME # Add the current user to the render and video groups
sudo apt install rocm
printenv
clinfo
- name: install dep
run: |
source $CONDA/etc/profile.d/conda.sh
conda activate ./.venv
pip install torch==${{env.TorchVersion}} torchaudio==${{env.TorchVersion}} --index-url https://download.pytorch.org/whl/rocm6.4
pip install wheel ninja packaging py-cpuinfo psutil
pip install tqdm pydantic msgpack hjson einops
pip install numpy build
pip install -r requirements/requirements-dev.txt
pip install -r requirements/requirements.txt
- name: build
run: |
source $CONDA/etc/profile.d/conda.sh
conda activate ./.venv
echo "hello"
DS_ACCELERATOR=cuda LD_LIBRARY_PATH=/opt/rocm-6.4.3/lib PATH=$PATH:/opt/rocm-6.4.3/bin DS_BUILD_SPARSE_ATTN=0 NCCL_DEBUG=INFO DS_BUILD_OPS=1 DS_BUILD_STRING="+rocm" ./install.sh
For <=0.17.5 you need to edit ./ops/op_builder/builder.py in /usr/local/lib/python3.12/dist-packages/deepspeed/ (or similar)
The function get_cuda_compile_flag must not return None so for example you can rewrite it as
def get_cuda_compile_flag(self):
return '-D__DISABLE_CUDA__'
see https://github.com/deepspeedai/DeepSpeed/pull/7521
For <=0.17.5 you need to edit ./ops/op_builder/builder.py in /usr/local/lib/python3.12/dist-packages/deepspeed/ (or similar)
The function get_cuda_compile_flag must not return None so for example you can rewrite it as
def get_cuda_compile_flag(self): return '-D__DISABLE_CUDA__'see #7521
thanks
For <=0.17.5 you need to edit ./ops/op_builder/builder.py in /usr/local/lib/python3.12/dist-packages/deepspeed/ (or similar)
The function get_cuda_compile_flag must not return None so for example you can rewrite it as
def get_cuda_compile_flag(self): return '-D__DISABLE_CUDA__'see #7521
May I ask which version of ROCM was used for compilation? Because I encountered the following error after compiling with version 6.4.3 I'm not sure if it's a version issue or if I'm missing some dependencies
/opt/rocm-6.4.3/lib/llvm/bin/../../../include/hip/amd_detail/amd_warp_sync_functions.h:235:52: error: static assertion failed due to requirement 'sizeof(unsigned int) == 8': The mask must be a 64-bit integer. Implicitly promoting a smaller integer is almost always an error.
235 | __hip_internal::is_integral<MaskT>::value && sizeof(MaskT) == 8,
| ^~~~~~~~~~~~~~~~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/../hip_linear/include/../../hip_linear/include/../../hip_linear/include/utils_paralleldequant.cuh:124:21: note: in instantiation of function template specialization '__shfl_sync<unsigned int, unsigned int>' requested here
124 | Scales[i] = __shfl_sync(0xffffffff, tmpReg, i, 4);
| ^
/opt/rocm-6.4.3/lib/llvm/bin/../../../include/hip/amd_detail/amd_warp_sync_functions.h:235:66: note: expression evaluates to '4 == 8'
235 | __hip_internal::is_integral<MaskT>::value && sizeof(MaskT) == 8,
| ~~~~~~~~~~~~~~^~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip:50:5: error: no matching function for call to 'hipFuncSetAttribute'
50 | hipFuncSetAttribute(QUANT_GEMM_Kernel<TilingConfig, OutputDataType>,
| ^~~~~~~~~~~~~~~~~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip:108:17: note: in instantiation of function template specialization 'Kernel_Ex<TilingConfig<4, 1, 1>, __half>' requested here
108 | Kernel_Ex<TilingConfig<4, 1, 1>, half>(
| ^
/opt/rocm-6.4.3/lib/llvm/bin/../../../include/hip/hip_runtime_api.h:2404:12: note: candidate function not viable: no overload of 'QUANT_GEMM_Kernel' matching 'const void *' for 1st argument
2404 | hipError_t hipFuncSetAttribute(const void* func, hipFuncAttribute attr, int value);
| ^ ~~~~~~~~~~~~~~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip:50:5: error: no matching function for call to 'hipFuncSetAttribute'
50 | hipFuncSetAttribute(QUANT_GEMM_Kernel<TilingConfig, OutputDataType>,
| ^~~~~~~~~~~~~~~~~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip:112:17: note: in instantiation of function template specialization 'Kernel_Ex<TilingConfig<4, 1, 2>, __half>' requested here
112 | Kernel_Ex<TilingConfig<4, 1, 2>, half>(
| ^
/opt/rocm-6.4.3/lib/llvm/bin/../../../include/hip/hip_runtime_api.h:2404:12: note: candidate function not viable: no overload of 'QUANT_GEMM_Kernel' matching 'const void *' for 1st argument
2404 | hipError_t hipFuncSetAttribute(const void* func, hipFuncAttribute attr, int value);
| ^ ~~~~~~~~~~~~~~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip:50:5: error: no matching function for call to 'hipFuncSetAttribute'
50 | hipFuncSetAttribute(QUANT_GEMM_Kernel<TilingConfig, OutputDataType>,
| ^~~~~~~~~~~~~~~~~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip:116:17: note: in instantiation of function template specialization 'Kernel_Ex<TilingConfig<4, 1, 4>, __half>' requested here
116 | Kernel_Ex<TilingConfig<4, 1, 4>, half>(
| ^
/opt/rocm-6.4.3/lib/llvm/bin/../../../include/hip/hip_runtime_api.h:2404:12: note: candidate function not viable: no overload of 'QUANT_GEMM_Kernel' matching 'const void *' for 1st argument
2404 | hipError_t hipFuncSetAttribute(const void* func, hipFuncAttribute attr, int value);
| ^ ~~~~~~~~~~~~~~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip:50:5: error: no matching function for call to 'hipFuncSetAttribute'
50 | hipFuncSetAttribute(QUANT_GEMM_Kernel<TilingConfig, OutputDataType>,
| ^~~~~~~~~~~~~~~~~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip:120:17: note: in instantiation of function template specialization 'Kernel_Ex<TilingConfig<4, 1, 8>, __half>' requested here
120 | Kernel_Ex<TilingConfig<4, 1, 8>, half>(
| ^
/opt/rocm-6.4.3/lib/llvm/bin/../../../include/hip/hip_runtime_api.h:2404:12: note: candidate function not viable: no overload of 'QUANT_GEMM_Kernel' matching 'const void *' for 1st argument
2404 | hipError_t hipFuncSetAttribute(const void* func, hipFuncAttribute attr, int value);
| ^ ~~~~~~~~~~~~~~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip:50:5: error: no matching function for call to 'hipFuncSetAttribute'
50 | hipFuncSetAttribute(QUANT_GEMM_Kernel<TilingConfig, OutputDataType>,
| ^~~~~~~~~~~~~~~~~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip:139:17: note: in instantiation of function template specialization 'Kernel_Ex<TilingConfig<4, 1, 1>, float>' requested here
139 | Kernel_Ex<TilingConfig<4, 1, 1>, float>(stream,
| ^
/opt/rocm-6.4.3/lib/llvm/bin/../../../include/hip/hip_runtime_api.h:2404:12: note: candidate function not viable: no overload of 'QUANT_GEMM_Kernel' matching 'const void *' for 1st argument
2404 | hipError_t hipFuncSetAttribute(const void* func, hipFuncAttribute attr, int value);
| ^ ~~~~~~~~~~~~~~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip:50:5: error: no matching function for call to 'hipFuncSetAttribute'
50 | hipFuncSetAttribute(QUANT_GEMM_Kernel<TilingConfig, OutputDataType>,
| ^~~~~~~~~~~~~~~~~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip:151:17: note: in instantiation of function template specialization 'Kernel_Ex<TilingConfig<4, 1, 2>, float>' requested here
151 | Kernel_Ex<TilingConfig<4, 1, 2>, float>(stream,
| ^
/opt/rocm-6.4.3/lib/llvm/bin/../../../include/hip/hip_runtime_api.h:2404:12: note: candidate function not viable: no overload of 'QUANT_GEMM_Kernel' matching 'const void *' for 1st argument
2404 | hipError_t hipFuncSetAttribute(const void* func, hipFuncAttribute attr, int value);
| ^ ~~~~~~~~~~~~~~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip:50:5: error: no matching function for call to 'hipFuncSetAttribute'
50 | hipFuncSetAttribute(QUANT_GEMM_Kernel<TilingConfig, OutputDataType>,
| ^~~~~~~~~~~~~~~~~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip:163:17: note: in instantiation of function template specialization 'Kernel_Ex<TilingConfig<4, 1, 4>, float>' requested here
163 | Kernel_Ex<TilingConfig<4, 1, 4>, float>(stream,
| ^
/opt/rocm-6.4.3/lib/llvm/bin/../../../include/hip/hip_runtime_api.h:2404:12: note: candidate function not viable: no overload of 'QUANT_GEMM_Kernel' matching 'const void *' for 1st argument
2404 | hipError_t hipFuncSetAttribute(const void* func, hipFuncAttribute attr, int value);
| ^ ~~~~~~~~~~~~~~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip:50:5: error: no matching function for call to 'hipFuncSetAttribute'
50 | hipFuncSetAttribute(QUANT_GEMM_Kernel<TilingConfig, OutputDataType>,
| ^~~~~~~~~~~~~~~~~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip:175:17: note: in instantiation of function template specialization 'Kernel_Ex<TilingConfig<4, 1, 8>, float>' requested here
175 | Kernel_Ex<TilingConfig<4, 1, 8>, float>(stream,
| ^
/opt/rocm-6.4.3/lib/llvm/bin/../../../include/hip/hip_runtime_api.h:2404:12: note: candidate function not viable: no overload of 'QUANT_GEMM_Kernel' matching 'const void *' for 1st argument
2404 | hipError_t hipFuncSetAttribute(const void* func, hipFuncAttribute attr, int value);
| ^ ~~~~~~~~~~~~~~~~
9 warnings and 9 errors generated when compiling for gfx1030.
failed to execute:/opt/rocm-6.4.3/lib/llvm/bin/clang++ --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx942 --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 --offload-arch=gfx1102 --offload-arch=gfx1200 --offload-arch=gfx1201 -I/home/runner/work/actions-builder/actions-builder/deepspeed/inference/v2/kernels/core_ops/bias_activations -I/home/runner/work/actions-builder/actions-builder/deepspeed/inference/v2/kernels/core_ops/blas_kernels -I/home/runner/work/actions-builder/actions-builder/deepspeed/inference/v2/kernels/core_ops/cuda_layer_norm -I/home/runner/work/actions-builder/actions-builder/deepspeed/inference/v2/kernels/core_ops/cuda_rms_norm -I/home/runner/work/actions-builder/actions-builder/deepspeed/inference/v2/kernels/core_ops/gated_activations -I/home/runner/work/actions-builder/actions-builder/deepspeed/inference/v2/kernels/core_ops/cuda_linear -I/home/runner/work/actions-builder/actions-builder/deepspeed/inference/v2/kernels/includes -I/home/runner/work/actions-builder/actions-builder/.venv/lib/python3.10/site-packages/torch/include -I/home/runner/work/actions-builder/actions-builder/.venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/home/runner/work/actions-builder/actions-builder/.venv/lib/python3.10/site-packages/torch/include/THH -I/opt/rocm-6.4.3/include -I/home/runner/work/actions-builder/actions-builder/.venv/include/python3.10 -c -x hip deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip -o "build/temp.linux-x86_64-cpython-310/deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.o" -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -fPIC -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -DHIP_ENABLE_WARP_SYNC_BUILTINS=1 -O3 -std=c++17 -U__HIP_NO_HALF_OPERATORS__ -U__HIP_NO_HALF_CONVERSIONS__ -U__HIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=6 -DROCM_VERSION_MINOR=4 -DROCM_WAVEFRONT_SIZE=32 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -DTORCH_EXTENSION_NAME=kernelsinference_core_ops -fno-gpu-rdc
error: command '/opt/rocm-6.4.3/bin/hipcc' failed with exit code 1
Maybe restrict the your long "..--offload-arch.." list via environment variables (see dockerfile below).
Host config:
kernel 6.8.12-10-pve
ROCk module version 6.12.12 is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.15
Runtime Ext Version: 1.7
(rocminfo)
repo is https://repo.radeon.com/rocm/apt/6.4 jammy main
rocm:
Installed: 6.4.0.60400-47~22.04
card is gfx1100 (W7900)
But actually I have running it in docker, which works great.
FROM rocm/vllm:rocm6.4.1_vllm_0.10.0_20250812
ENV AMDGPU_TARGETS=gfx1100
RUN echo '#/bin/bash\necho gfx1100' > /opt/rocm/llvm/bin/amdgpu-arch && chmod 755 /opt/rocm/llvm/bin/amdgpu-arch
RUN apt-get update && apt-get install -y mc strace libopenmpi-dev
WORKDIR /root
RUN git clone -b rocm_enabled_multi_backend https://github.com/ROCm/bitsandbytes.git
RUN cd bitsandbytes/ && cmake -DGPU_TARGETS="gfx1100" -DCMAKE_HIP_ARCHITECTURES=gfx1100 -DBNB_ROCM_ARCH="gfx1100" -DCOMPUTE_BACKEND=hip -S . && make && pip install -e .
RUN pip install datasets>=3.4.1 sentencepiece>=0.2.0 tqdm psutil wheel>=0.42.0 accelerate>=0.34.1 peft>=0.7.1,!=0.11.0 einops mpi4py diffusers hjson transformers==4.53.3
WORKDIR /root
RUN git clone https://github.com/ROCm/xformers.git
RUN cd xformers/ && git submodule update --init --recursive && git checkout 0f0bb9d && PYTORCH_ROCM_ARCH=gfx1100 python setup.py install
ENV FLASH_ATTENTION_TRITON_AMD_ENABLE="TRUE"
WORKDIR /root
RUN git clone https://github.com/ROCm/flash-attention.git
RUN cd flash-attention && git checkout main_perf && python setup.py install
deepspeed via pip (0.17.5) or from git (pip install .) works.
Maybe restrict the your long "..--offload-arch.." list via environment variables (see dockerfile below).
Host config:
kernel 6.8.12-10-pve ROCk module version 6.12.12 is loaded ===================== HSA System Attributes ===================== Runtime Version: 1.15 Runtime Ext Version: 1.7 (rocminfo) repo is https://repo.radeon.com/rocm/apt/6.4 jammy main rocm: Installed: 6.4.0.60400-47~22.04 card is gfx1100 (W7900)But actually I have running it in docker, which works great.
FROM rocm/vllm:rocm6.4.1_vllm_0.10.0_20250812 ENV AMDGPU_TARGETS=gfx1100 RUN echo '#/bin/bash\necho gfx1100' > /opt/rocm/llvm/bin/amdgpu-arch && chmod 755 /opt/rocm/llvm/bin/amdgpu-arch RUN apt-get update && apt-get install -y mc strace libopenmpi-dev WORKDIR /root RUN git clone -b rocm_enabled_multi_backend https://github.com/ROCm/bitsandbytes.git RUN cd bitsandbytes/ && cmake -DGPU_TARGETS="gfx1100" -DCMAKE_HIP_ARCHITECTURES=gfx1100 -DBNB_ROCM_ARCH="gfx1100" -DCOMPUTE_BACKEND=hip -S . && make && pip install -e . RUN pip install datasets>=3.4.1 sentencepiece>=0.2.0 tqdm psutil wheel>=0.42.0 accelerate>=0.34.1 peft>=0.7.1,!=0.11.0 einops mpi4py diffusers hjson transformers==4.53.3 WORKDIR /root RUN git clone https://github.com/ROCm/xformers.git RUN cd xformers/ && git submodule update --init --recursive && git checkout 0f0bb9d && PYTORCH_ROCM_ARCH=gfx1100 python setup.py install ENV FLASH_ATTENTION_TRITON_AMD_ENABLE="TRUE" WORKDIR /root RUN git clone https://github.com/ROCm/flash-attention.git RUN cd flash-attention && git checkout main_perf && python setup.py installdeepspeed via pip (0.17.5) or from git (pip install .) works.
Thank you for your reply However, even after limiting the devices, I still encounter the same error, and I don't know why I'll try modifying the image to see if it works
@wszgrcy - did the linked PR resolve your issue from the DeepSpeed error at least?
@wszgrcy - did the linked PR resolve your issue from the DeepSpeed error at least?
After making modifications, new problems have arisen, and we are currently seeking solutions
Maybe restrict the your long "..--offload-arch.." list via environment variables (see dockerfile below).
Host config:
kernel 6.8.12-10-pve ROCk module version 6.12.12 is loaded ===================== HSA System Attributes ===================== Runtime Version: 1.15 Runtime Ext Version: 1.7 (rocminfo) repo is https://repo.radeon.com/rocm/apt/6.4 jammy main rocm: Installed: 6.4.0.60400-47~22.04 card is gfx1100 (W7900)But actually I have running it in docker, which works great.
FROM rocm/vllm:rocm6.4.1_vllm_0.10.0_20250812 ENV AMDGPU_TARGETS=gfx1100 RUN echo '#/bin/bash\necho gfx1100' > /opt/rocm/llvm/bin/amdgpu-arch && chmod 755 /opt/rocm/llvm/bin/amdgpu-arch RUN apt-get update && apt-get install -y mc strace libopenmpi-dev WORKDIR /root RUN git clone -b rocm_enabled_multi_backend https://github.com/ROCm/bitsandbytes.git RUN cd bitsandbytes/ && cmake -DGPU_TARGETS="gfx1100" -DCMAKE_HIP_ARCHITECTURES=gfx1100 -DBNB_ROCM_ARCH="gfx1100" -DCOMPUTE_BACKEND=hip -S . && make && pip install -e . RUN pip install datasets>=3.4.1 sentencepiece>=0.2.0 tqdm psutil wheel>=0.42.0 accelerate>=0.34.1 peft>=0.7.1,!=0.11.0 einops mpi4py diffusers hjson transformers==4.53.3 WORKDIR /root RUN git clone https://github.com/ROCm/xformers.git RUN cd xformers/ && git submodule update --init --recursive && git checkout 0f0bb9d && PYTORCH_ROCM_ARCH=gfx1100 python setup.py install ENV FLASH_ATTENTION_TRITON_AMD_ENABLE="TRUE" WORKDIR /root RUN git clone https://github.com/ROCm/flash-attention.git RUN cd flash-attention && git checkout main_perf && python setup.py installdeepspeed via pip (0.17.5) or from git (pip install .) works.
May I ask if you could share the build log?
I also used the VLLM image and executed the build command, but still had the same error
But I don't have an Amd graphics card, so I don't know if this is the problem
This is also the reason why I am sending an issue. I don't know what DeepSpeed relies on when building it? Because no information such as the rocm version, torch version, python version, etc. can be found in the code
After some testing, it was found that the entire issue seems to be that the torch version needs to correspond to the rocm version, but I am currently unsure how to do so? There was an exception when using 2.7.0 directly in the image, which disappeared when upgrading to 2.8.0, but there was still an issue https://github.com/deepspeedai/DeepSpeed/issues/7565#issuecomment-3305379549 So maybe we haven't found the real rocm version yet?
I'm using rocm/vllm docker images as base for quite some time now, I don't think that there is a problem. Please try "pip install -U deepspeed" and provide the output of ds_report (even without gpu)
pip install -U deepspeed
I have identified the location of the error and reviewed the source code If I'm not mistaken, this exception should occur 100% of the time, unless rocm is normal in a certain version (but I've tried several smaller versions and it's the same...) https://github.com/deepspeedai/DeepSpeed/issues/7590 Can you print 'ds_deport' and let me see which features have been compiled?
Here output without GPU:
[2025-09-25 12:35:06,500] [WARNING] [real_accelerator.py:209:get_accelerator] Setting accelerator to CPU. If you have GPU or other accelerator, we were unable to detect it.
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
deepspeed_not_implemented [NO] ....... [OKAY]
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
deepspeed_ccl_comm ..... [NO] ....... [OKAY]
deepspeed_shm_comm ..... [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/usr/local/lib/python3.12/dist-packages/torch']
torch version .................... 2.7.0+gitf717b2a
deepspeed install path ........... ['/usr/local/lib/python3.12/dist-packages/deepspeed']
deepspeed info ................... 0.17.6, unknown, unknown
deepspeed wheel compiled w. ...... torch 2.7
shared memory (/dev/shm) size .... 2.00 GB
Here output without GPU:
[2025-09-25 12:35:06,500] [WARNING] [real_accelerator.py:209:get_accelerator] Setting accelerator to CPU. If you have GPU or other accelerator, we were unable to detect it. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- deepspeed_not_implemented [NO] ....... [OKAY] [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] deepspeed_ccl_comm ..... [NO] ....... [OKAY] deepspeed_shm_comm ..... [NO] ....... [OKAY] cpu_adam ............... [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/usr/local/lib/python3.12/dist-packages/torch'] torch version .................... 2.7.0+gitf717b2a deepspeed install path ........... ['/usr/local/lib/python3.12/dist-packages/deepspeed'] deepspeed info ................... 0.17.6, unknown, unknown deepspeed wheel compiled w. ...... torch 2.7 shared memory (/dev/shm) size .... 2.00 GB
thanks