DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

Do I need an AMD graphics card to compile the ROCM version of Deepspeed?

Open wszgrcy opened this issue 3 months ago • 12 comments

I have successfully compiled the Linux CU129 version of DeepSpeed and have not used an Nvidia graphics card Then I tried to build a deep speed version of the rocm, but it failed I built it in GitHub Actions, so I'm not sure if it's a configuration issue or if an AMD graphics card is necessary error

Attempting to remove deepspeed/git_version_info_installed.py
Attempting to remove dist
Attempting to remove build
Attempting to remove deepspeed.egg-info
No hostfile exists at /job/hostfile, installing locally
Building deepspeed wheel
* Getting build dependencies for wheel...
[2025-09-16 09:51:10,895] [INFO] [real_accelerator.py:260:get_accelerator] Setting ds_accelerator to cuda (override)
[2025-09-16 09:51:11,133] [INFO] [real_accelerator.py:260:get_accelerator] Setting ds_accelerator to cuda (override)
[2025-09-16 09:51:14,953] [INFO] [logging.py:107:log_dist] [Rank -1] [TorchCheckpointEngine] Initialized with serialization = False
Traceback (most recent call last):
DS_BUILD_OPS=1
  File "/home/runner/work/actions-builder/actions-builder/.venv/lib/python3.10/site-packages/pyproject_hooks/_in_process/_in_process.py", line 389, in <module>
    main()
  File "/home/runner/work/actions-builder/actions-builder/.venv/lib/python3.10/site-packages/pyproject_hooks/_in_process/_in_process.py", line 373, in main
    json_out["return_val"] = hook(**hook_input["kwargs"])
  File "/home/runner/work/actions-builder/actions-builder/.venv/lib/python3.10/site-packages/pyproject_hooks/_in_process/_in_process.py", line 143, in get_requires_for_build_wheel
    return hook(config_settings)
  File "/home/runner/work/actions-builder/actions-builder/.venv/lib/python3.10/site-packages/setuptools/build_meta.py", line 334, in get_requires_for_build_wheel
    return self._get_build_requires(config_settings, requirements=[])
  File "/home/runner/work/actions-builder/actions-builder/.venv/lib/python3.10/site-packages/setuptools/build_meta.py", line 304, in _get_build_requires
    self.run_setup()
  File "/home/runner/work/actions-builder/actions-builder/.venv/lib/python3.10/site-packages/setuptools/build_meta.py", line 522, in run_setup
    super().run_setup(setup_script=setup_script)
  File "/home/runner/work/actions-builder/actions-builder/.venv/lib/python3.10/site-packages/setuptools/build_meta.py", line 320, in run_setup
    exec(code, locals())
  File "<string>", line 201, in <module>
  File "/home/runner/work/actions-builder/actions-builder/op_builder/builder.py", line 709, in builder
    {'cxx': self.strip_empty_entries(self.cxx_args()), \
  File "/home/runner/work/actions-builder/actions-builder/op_builder/builder.py", line 401, in strip_empty_entries
    return [x for x in args if len(x) > 0]
  File "/home/runner/work/actions-builder/actions-builder/op_builder/builder.py", line 401, in <listcomp>
    return [x for x in args if len(x) > 0]
TypeError: object of type 'NoneType' has no len()

ERROR Backend subprocess exited when trying to invoke get_requires_for_build_wheel
Error on line 155
Fail to install deepspeed

runner

      - run: |
          uname -m && cat /etc/*release
          uname -srmv

          wget https://repo.radeon.com/amdgpu-install/6.4.3/ubuntu/jammy/amdgpu-install_6.4.60403-1_all.deb
          sudo apt install ./amdgpu-install_6.4.60403-1_all.deb
          sudo apt update
          sudo apt install python3-setuptools python3-wheel
          sudo usermod -a -G render,video $LOGNAME # Add the current user to the render and video groups
          sudo apt install rocm
          printenv
          clinfo
- name: install dep
        run: |
          source $CONDA/etc/profile.d/conda.sh
          conda activate ./.venv
          pip install torch==${{env.TorchVersion}} torchaudio==${{env.TorchVersion}} --index-url https://download.pytorch.org/whl/rocm6.4
          pip install wheel ninja packaging py-cpuinfo psutil
          pip install tqdm pydantic msgpack hjson einops
          pip install numpy  build
          pip install -r requirements/requirements-dev.txt
          pip install -r requirements/requirements.txt

      - name: build
        run: |
          source $CONDA/etc/profile.d/conda.sh
          conda activate ./.venv
          echo "hello"
          DS_ACCELERATOR=cuda LD_LIBRARY_PATH=/opt/rocm-6.4.3/lib PATH=$PATH:/opt/rocm-6.4.3/bin DS_BUILD_SPARSE_ATTN=0 NCCL_DEBUG=INFO DS_BUILD_OPS=1  DS_BUILD_STRING="+rocm" ./install.sh

wszgrcy avatar Sep 16 '25 11:09 wszgrcy

For <=0.17.5 you need to edit ./ops/op_builder/builder.py in /usr/local/lib/python3.12/dist-packages/deepspeed/ (or similar)

The function get_cuda_compile_flag must not return None so for example you can rewrite it as

    def get_cuda_compile_flag(self):
        return '-D__DISABLE_CUDA__'

see https://github.com/deepspeedai/DeepSpeed/pull/7521

maaaax avatar Sep 17 '25 16:09 maaaax

For <=0.17.5 you need to edit ./ops/op_builder/builder.py in /usr/local/lib/python3.12/dist-packages/deepspeed/ (or similar)

The function get_cuda_compile_flag must not return None so for example you can rewrite it as

    def get_cuda_compile_flag(self):
        return '-D__DISABLE_CUDA__'

see #7521

thanks

wszgrcy avatar Sep 17 '25 16:09 wszgrcy

For <=0.17.5 you need to edit ./ops/op_builder/builder.py in /usr/local/lib/python3.12/dist-packages/deepspeed/ (or similar)

The function get_cuda_compile_flag must not return None so for example you can rewrite it as

    def get_cuda_compile_flag(self):
        return '-D__DISABLE_CUDA__'

see #7521

May I ask which version of ROCM was used for compilation? Because I encountered the following error after compiling with version 6.4.3 I'm not sure if it's a version issue or if I'm missing some dependencies

/opt/rocm-6.4.3/lib/llvm/bin/../../../include/hip/amd_detail/amd_warp_sync_functions.h:235:52: error: static assertion failed due to requirement 'sizeof(unsigned int) == 8': The mask must be a 64-bit integer. Implicitly promoting a smaller integer is almost always an error.
  235 |       __hip_internal::is_integral<MaskT>::value && sizeof(MaskT) == 8,
      |                                                    ^~~~~~~~~~~~~~~~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/../hip_linear/include/../../hip_linear/include/../../hip_linear/include/utils_paralleldequant.cuh:124:21: note: in instantiation of function template specialization '__shfl_sync<unsigned int, unsigned int>' requested here
  124 |         Scales[i] = __shfl_sync(0xffffffff, tmpReg, i, 4);
      |                     ^
/opt/rocm-6.4.3/lib/llvm/bin/../../../include/hip/amd_detail/amd_warp_sync_functions.h:235:66: note: expression evaluates to '4 == 8'
  235 |       __hip_internal::is_integral<MaskT>::value && sizeof(MaskT) == 8,
      |                                                    ~~~~~~~~~~~~~~^~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip:50:5: error: no matching function for call to 'hipFuncSetAttribute'
   50 |     hipFuncSetAttribute(QUANT_GEMM_Kernel<TilingConfig, OutputDataType>,
      |     ^~~~~~~~~~~~~~~~~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip:108:17: note: in instantiation of function template specialization 'Kernel_Ex<TilingConfig<4, 1, 1>, __half>' requested here
  108 |                 Kernel_Ex<TilingConfig<4, 1, 1>, half>(
      |                 ^
/opt/rocm-6.4.3/lib/llvm/bin/../../../include/hip/hip_runtime_api.h:2404:12: note: candidate function not viable: no overload of 'QUANT_GEMM_Kernel' matching 'const void *' for 1st argument
 2404 | hipError_t hipFuncSetAttribute(const void* func, hipFuncAttribute attr, int value);
      |            ^                   ~~~~~~~~~~~~~~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip:50:5: error: no matching function for call to 'hipFuncSetAttribute'
   50 |     hipFuncSetAttribute(QUANT_GEMM_Kernel<TilingConfig, OutputDataType>,
      |     ^~~~~~~~~~~~~~~~~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip:112:17: note: in instantiation of function template specialization 'Kernel_Ex<TilingConfig<4, 1, 2>, __half>' requested here
  112 |                 Kernel_Ex<TilingConfig<4, 1, 2>, half>(
      |                 ^
/opt/rocm-6.4.3/lib/llvm/bin/../../../include/hip/hip_runtime_api.h:2404:12: note: candidate function not viable: no overload of 'QUANT_GEMM_Kernel' matching 'const void *' for 1st argument
 2404 | hipError_t hipFuncSetAttribute(const void* func, hipFuncAttribute attr, int value);
      |            ^                   ~~~~~~~~~~~~~~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip:50:5: error: no matching function for call to 'hipFuncSetAttribute'
   50 |     hipFuncSetAttribute(QUANT_GEMM_Kernel<TilingConfig, OutputDataType>,
      |     ^~~~~~~~~~~~~~~~~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip:116:17: note: in instantiation of function template specialization 'Kernel_Ex<TilingConfig<4, 1, 4>, __half>' requested here
  116 |                 Kernel_Ex<TilingConfig<4, 1, 4>, half>(
      |                 ^
/opt/rocm-6.4.3/lib/llvm/bin/../../../include/hip/hip_runtime_api.h:2404:12: note: candidate function not viable: no overload of 'QUANT_GEMM_Kernel' matching 'const void *' for 1st argument
 2404 | hipError_t hipFuncSetAttribute(const void* func, hipFuncAttribute attr, int value);
      |            ^                   ~~~~~~~~~~~~~~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip:50:5: error: no matching function for call to 'hipFuncSetAttribute'
   50 |     hipFuncSetAttribute(QUANT_GEMM_Kernel<TilingConfig, OutputDataType>,
      |     ^~~~~~~~~~~~~~~~~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip:120:17: note: in instantiation of function template specialization 'Kernel_Ex<TilingConfig<4, 1, 8>, __half>' requested here
  120 |                 Kernel_Ex<TilingConfig<4, 1, 8>, half>(
      |                 ^
/opt/rocm-6.4.3/lib/llvm/bin/../../../include/hip/hip_runtime_api.h:2404:12: note: candidate function not viable: no overload of 'QUANT_GEMM_Kernel' matching 'const void *' for 1st argument
 2404 | hipError_t hipFuncSetAttribute(const void* func, hipFuncAttribute attr, int value);
      |            ^                   ~~~~~~~~~~~~~~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip:50:5: error: no matching function for call to 'hipFuncSetAttribute'
   50 |     hipFuncSetAttribute(QUANT_GEMM_Kernel<TilingConfig, OutputDataType>,
      |     ^~~~~~~~~~~~~~~~~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip:139:17: note: in instantiation of function template specialization 'Kernel_Ex<TilingConfig<4, 1, 1>, float>' requested here
  139 |                 Kernel_Ex<TilingConfig<4, 1, 1>, float>(stream,
      |                 ^
/opt/rocm-6.4.3/lib/llvm/bin/../../../include/hip/hip_runtime_api.h:2404:12: note: candidate function not viable: no overload of 'QUANT_GEMM_Kernel' matching 'const void *' for 1st argument
 2404 | hipError_t hipFuncSetAttribute(const void* func, hipFuncAttribute attr, int value);
      |            ^                   ~~~~~~~~~~~~~~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip:50:5: error: no matching function for call to 'hipFuncSetAttribute'
   50 |     hipFuncSetAttribute(QUANT_GEMM_Kernel<TilingConfig, OutputDataType>,
      |     ^~~~~~~~~~~~~~~~~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip:151:17: note: in instantiation of function template specialization 'Kernel_Ex<TilingConfig<4, 1, 2>, float>' requested here
  151 |                 Kernel_Ex<TilingConfig<4, 1, 2>, float>(stream,
      |                 ^
/opt/rocm-6.4.3/lib/llvm/bin/../../../include/hip/hip_runtime_api.h:2404:12: note: candidate function not viable: no overload of 'QUANT_GEMM_Kernel' matching 'const void *' for 1st argument
 2404 | hipError_t hipFuncSetAttribute(const void* func, hipFuncAttribute attr, int value);
      |            ^                   ~~~~~~~~~~~~~~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip:50:5: error: no matching function for call to 'hipFuncSetAttribute'
   50 |     hipFuncSetAttribute(QUANT_GEMM_Kernel<TilingConfig, OutputDataType>,
      |     ^~~~~~~~~~~~~~~~~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip:163:17: note: in instantiation of function template specialization 'Kernel_Ex<TilingConfig<4, 1, 4>, float>' requested here
  163 |                 Kernel_Ex<TilingConfig<4, 1, 4>, float>(stream,
      |                 ^
/opt/rocm-6.4.3/lib/llvm/bin/../../../include/hip/hip_runtime_api.h:2404:12: note: candidate function not viable: no overload of 'QUANT_GEMM_Kernel' matching 'const void *' for 1st argument
 2404 | hipError_t hipFuncSetAttribute(const void* func, hipFuncAttribute attr, int value);
      |            ^                   ~~~~~~~~~~~~~~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip:50:5: error: no matching function for call to 'hipFuncSetAttribute'
   50 |     hipFuncSetAttribute(QUANT_GEMM_Kernel<TilingConfig, OutputDataType>,
      |     ^~~~~~~~~~~~~~~~~~~
deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip:175:17: note: in instantiation of function template specialization 'Kernel_Ex<TilingConfig<4, 1, 8>, float>' requested here
  175 |                 Kernel_Ex<TilingConfig<4, 1, 8>, float>(stream,
      |                 ^
/opt/rocm-6.4.3/lib/llvm/bin/../../../include/hip/hip_runtime_api.h:2404:12: note: candidate function not viable: no overload of 'QUANT_GEMM_Kernel' matching 'const void *' for 1st argument
 2404 | hipError_t hipFuncSetAttribute(const void* func, hipFuncAttribute attr, int value);
      |            ^                   ~~~~~~~~~~~~~~~~
9 warnings and 9 errors generated when compiling for gfx1030.
failed to execute:/opt/rocm-6.4.3/lib/llvm/bin/clang++  --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx942 --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 --offload-arch=gfx1102 --offload-arch=gfx1200 --offload-arch=gfx1201  -I/home/runner/work/actions-builder/actions-builder/deepspeed/inference/v2/kernels/core_ops/bias_activations -I/home/runner/work/actions-builder/actions-builder/deepspeed/inference/v2/kernels/core_ops/blas_kernels -I/home/runner/work/actions-builder/actions-builder/deepspeed/inference/v2/kernels/core_ops/cuda_layer_norm -I/home/runner/work/actions-builder/actions-builder/deepspeed/inference/v2/kernels/core_ops/cuda_rms_norm -I/home/runner/work/actions-builder/actions-builder/deepspeed/inference/v2/kernels/core_ops/gated_activations -I/home/runner/work/actions-builder/actions-builder/deepspeed/inference/v2/kernels/core_ops/cuda_linear -I/home/runner/work/actions-builder/actions-builder/deepspeed/inference/v2/kernels/includes -I/home/runner/work/actions-builder/actions-builder/.venv/lib/python3.10/site-packages/torch/include -I/home/runner/work/actions-builder/actions-builder/.venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/home/runner/work/actions-builder/actions-builder/.venv/lib/python3.10/site-packages/torch/include/THH -I/opt/rocm-6.4.3/include -I/home/runner/work/actions-builder/actions-builder/.venv/include/python3.10 -c -x hip deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.hip -o "build/temp.linux-x86_64-cpython-310/deepspeed/inference/v2/kernels/core_ops/hip_linear/linear_kernels_hip.o" -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -fPIC -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -DHIP_ENABLE_WARP_SYNC_BUILTINS=1 -O3 -std=c++17 -U__HIP_NO_HALF_OPERATORS__ -U__HIP_NO_HALF_CONVERSIONS__ -U__HIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=6 -DROCM_VERSION_MINOR=4 -DROCM_WAVEFRONT_SIZE=32 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -DTORCH_EXTENSION_NAME=kernelsinference_core_ops -fno-gpu-rdc
error: command '/opt/rocm-6.4.3/bin/hipcc' failed with exit code 1

wszgrcy avatar Sep 18 '25 04:09 wszgrcy

Maybe restrict the your long "..--offload-arch.." list via environment variables (see dockerfile below).

Host config:

kernel 6.8.12-10-pve

ROCk module version 6.12.12 is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.15
Runtime Ext Version:     1.7
 (rocminfo)

repo is https://repo.radeon.com/rocm/apt/6.4 jammy main
rocm:
  Installed: 6.4.0.60400-47~22.04

card is gfx1100 (W7900)

But actually I have running it in docker, which works great.

FROM rocm/vllm:rocm6.4.1_vllm_0.10.0_20250812

ENV AMDGPU_TARGETS=gfx1100
RUN echo '#/bin/bash\necho gfx1100' > /opt/rocm/llvm/bin/amdgpu-arch && chmod 755 /opt/rocm/llvm/bin/amdgpu-arch
RUN apt-get update && apt-get install -y mc strace libopenmpi-dev

WORKDIR /root
RUN git clone -b rocm_enabled_multi_backend https://github.com/ROCm/bitsandbytes.git
RUN cd bitsandbytes/ && cmake -DGPU_TARGETS="gfx1100" -DCMAKE_HIP_ARCHITECTURES=gfx1100 -DBNB_ROCM_ARCH="gfx1100" -DCOMPUTE_BACKEND=hip -S . && make && pip install -e .

RUN pip install datasets>=3.4.1 sentencepiece>=0.2.0 tqdm psutil wheel>=0.42.0 accelerate>=0.34.1 peft>=0.7.1,!=0.11.0 einops mpi4py diffusers hjson transformers==4.53.3

WORKDIR /root
RUN git clone https://github.com/ROCm/xformers.git
RUN cd xformers/ && git submodule update --init --recursive && git checkout 0f0bb9d && PYTORCH_ROCM_ARCH=gfx1100 python setup.py install

ENV FLASH_ATTENTION_TRITON_AMD_ENABLE="TRUE"
WORKDIR /root
RUN git clone https://github.com/ROCm/flash-attention.git
RUN cd flash-attention && git checkout main_perf && python setup.py install

deepspeed via pip (0.17.5) or from git (pip install .) works.

maaaax avatar Sep 18 '25 11:09 maaaax

Maybe restrict the your long "..--offload-arch.." list via environment variables (see dockerfile below).

Host config:

kernel 6.8.12-10-pve

ROCk module version 6.12.12 is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.15
Runtime Ext Version:     1.7
 (rocminfo)

repo is https://repo.radeon.com/rocm/apt/6.4 jammy main
rocm:
  Installed: 6.4.0.60400-47~22.04

card is gfx1100 (W7900)

But actually I have running it in docker, which works great.

FROM rocm/vllm:rocm6.4.1_vllm_0.10.0_20250812

ENV AMDGPU_TARGETS=gfx1100
RUN echo '#/bin/bash\necho gfx1100' > /opt/rocm/llvm/bin/amdgpu-arch && chmod 755 /opt/rocm/llvm/bin/amdgpu-arch
RUN apt-get update && apt-get install -y mc strace libopenmpi-dev

WORKDIR /root
RUN git clone -b rocm_enabled_multi_backend https://github.com/ROCm/bitsandbytes.git
RUN cd bitsandbytes/ && cmake -DGPU_TARGETS="gfx1100" -DCMAKE_HIP_ARCHITECTURES=gfx1100 -DBNB_ROCM_ARCH="gfx1100" -DCOMPUTE_BACKEND=hip -S . && make && pip install -e .

RUN pip install datasets>=3.4.1 sentencepiece>=0.2.0 tqdm psutil wheel>=0.42.0 accelerate>=0.34.1 peft>=0.7.1,!=0.11.0 einops mpi4py diffusers hjson transformers==4.53.3

WORKDIR /root
RUN git clone https://github.com/ROCm/xformers.git
RUN cd xformers/ && git submodule update --init --recursive && git checkout 0f0bb9d && PYTORCH_ROCM_ARCH=gfx1100 python setup.py install

ENV FLASH_ATTENTION_TRITON_AMD_ENABLE="TRUE"
WORKDIR /root
RUN git clone https://github.com/ROCm/flash-attention.git
RUN cd flash-attention && git checkout main_perf && python setup.py install

deepspeed via pip (0.17.5) or from git (pip install .) works.

Thank you for your reply However, even after limiting the devices, I still encounter the same error, and I don't know why I'll try modifying the image to see if it works

wszgrcy avatar Sep 19 '25 00:09 wszgrcy

@wszgrcy - did the linked PR resolve your issue from the DeepSpeed error at least?

loadams avatar Sep 19 '25 04:09 loadams

@wszgrcy - did the linked PR resolve your issue from the DeepSpeed error at least?

After making modifications, new problems have arisen, and we are currently seeking solutions

wszgrcy avatar Sep 19 '25 05:09 wszgrcy

Maybe restrict the your long "..--offload-arch.." list via environment variables (see dockerfile below).

Host config:

kernel 6.8.12-10-pve

ROCk module version 6.12.12 is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.15
Runtime Ext Version:     1.7
 (rocminfo)

repo is https://repo.radeon.com/rocm/apt/6.4 jammy main
rocm:
  Installed: 6.4.0.60400-47~22.04

card is gfx1100 (W7900)

But actually I have running it in docker, which works great.

FROM rocm/vllm:rocm6.4.1_vllm_0.10.0_20250812

ENV AMDGPU_TARGETS=gfx1100
RUN echo '#/bin/bash\necho gfx1100' > /opt/rocm/llvm/bin/amdgpu-arch && chmod 755 /opt/rocm/llvm/bin/amdgpu-arch
RUN apt-get update && apt-get install -y mc strace libopenmpi-dev

WORKDIR /root
RUN git clone -b rocm_enabled_multi_backend https://github.com/ROCm/bitsandbytes.git
RUN cd bitsandbytes/ && cmake -DGPU_TARGETS="gfx1100" -DCMAKE_HIP_ARCHITECTURES=gfx1100 -DBNB_ROCM_ARCH="gfx1100" -DCOMPUTE_BACKEND=hip -S . && make && pip install -e .

RUN pip install datasets>=3.4.1 sentencepiece>=0.2.0 tqdm psutil wheel>=0.42.0 accelerate>=0.34.1 peft>=0.7.1,!=0.11.0 einops mpi4py diffusers hjson transformers==4.53.3

WORKDIR /root
RUN git clone https://github.com/ROCm/xformers.git
RUN cd xformers/ && git submodule update --init --recursive && git checkout 0f0bb9d && PYTORCH_ROCM_ARCH=gfx1100 python setup.py install

ENV FLASH_ATTENTION_TRITON_AMD_ENABLE="TRUE"
WORKDIR /root
RUN git clone https://github.com/ROCm/flash-attention.git
RUN cd flash-attention && git checkout main_perf && python setup.py install

deepspeed via pip (0.17.5) or from git (pip install .) works.

May I ask if you could share the build log? I also used the VLLM image and executed the build command, but still had the same error But I don't have an Amd graphics card, so I don't know if this is the problem

This is also the reason why I am sending an issue. I don't know what DeepSpeed relies on when building it? Because no information such as the rocm version, torch version, python version, etc. can be found in the code

After some testing, it was found that the entire issue seems to be that the torch version needs to correspond to the rocm version, but I am currently unsure how to do so? There was an exception when using 2.7.0 directly in the image, which disappeared when upgrading to 2.8.0, but there was still an issue https://github.com/deepspeedai/DeepSpeed/issues/7565#issuecomment-3305379549 So maybe we haven't found the real rocm version yet?

wszgrcy avatar Sep 24 '25 15:09 wszgrcy

I'm using rocm/vllm docker images as base for quite some time now, I don't think that there is a problem. Please try "pip install -U deepspeed" and provide the output of ds_report (even without gpu)

maaaax avatar Sep 24 '25 15:09 maaaax

pip install -U deepspeed

I have identified the location of the error and reviewed the source code If I'm not mistaken, this exception should occur 100% of the time, unless rocm is normal in a certain version (but I've tried several smaller versions and it's the same...) https://github.com/deepspeedai/DeepSpeed/issues/7590 Can you print 'ds_deport' and let me see which features have been compiled?

wszgrcy avatar Sep 24 '25 16:09 wszgrcy

Here output without GPU:

[2025-09-25 12:35:06,500] [WARNING] [real_accelerator.py:209:get_accelerator] Setting accelerator to CPU. If you have GPU or other accelerator, we were unable to detect it.
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
deepspeed_not_implemented  [NO] ....... [OKAY]
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
deepspeed_ccl_comm ..... [NO] ....... [OKAY]
deepspeed_shm_comm ..... [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/usr/local/lib/python3.12/dist-packages/torch']
torch version .................... 2.7.0+gitf717b2a
deepspeed install path ........... ['/usr/local/lib/python3.12/dist-packages/deepspeed']
deepspeed info ................... 0.17.6, unknown, unknown
deepspeed wheel compiled w. ...... torch 2.7 
shared memory (/dev/shm) size .... 2.00 GB

maaaax avatar Sep 25 '25 12:09 maaaax

Here output without GPU:

[2025-09-25 12:35:06,500] [WARNING] [real_accelerator.py:209:get_accelerator] Setting accelerator to CPU. If you have GPU or other accelerator, we were unable to detect it.
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
deepspeed_not_implemented  [NO] ....... [OKAY]
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
deepspeed_ccl_comm ..... [NO] ....... [OKAY]
deepspeed_shm_comm ..... [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/usr/local/lib/python3.12/dist-packages/torch']
torch version .................... 2.7.0+gitf717b2a
deepspeed install path ........... ['/usr/local/lib/python3.12/dist-packages/deepspeed']
deepspeed info ................... 0.17.6, unknown, unknown
deepspeed wheel compiled w. ...... torch 2.7 
shared memory (/dev/shm) size .... 2.00 GB

thanks

wszgrcy avatar Sep 25 '25 17:09 wszgrcy