[BUG] Compiling Error: ModuleNotFoundError: No module named 'cmake' and failed to set dynamic section sizes: bad value
Describe the bug I am trying to install deepspeed with:
DS_BUILD_OPS=1 pip install deepspeed --global-option="build_ext" --global-option="-j8"
and failed building ninja and deepspeed.
Log output For ninja:
Building wheel for ninja (pyproject.toml) ... error
error: subprocess-exited-with-error
× Building wheel for ninja (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [13 lines of output]
Traceback (most recent call last):
File "/mnt/data/conda/envs/deepspeed/bin/cmake", line 5, in <module>
from cmake import cmake
ModuleNotFoundError: No module named 'cmake'
Traceback (most recent call last):
File "/tmp/pip-build-env-a3h_bgq2/overlay/lib/python3.8/site-packages/skbuild/setuptools_wrap.py", line 645, in setup
cmkr = cmaker.CMaker(cmake_executable)
File "/tmp/pip-build-env-a3h_bgq2/overlay/lib/python3.8/site-packages/skbuild/cmaker.py", line 148, in __init__
self.cmake_version = get_cmake_version(self.cmake_executable)
File "/tmp/pip-build-env-a3h_bgq2/overlay/lib/python3.8/site-packages/skbuild/cmaker.py", line 105, in get_cmake_version
raise SKBuildError(msg) from err
Problem with the CMake installation, aborting build. CMake executable is /mnt/data/conda/envs/deepspeed/bin/cmake
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for ninja
For more info here, I can run this cmd successfully:
(deepspeed) root@a08720c6-3543-4062-b274:/mnt/home/deepspeed# python
Python 3.8.16 (default, Mar 2 2023, 03:21:46)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from cmake import cmake
>>>
For deepspeed: Too long, error like this:
/usr/local/cuda/bin/nvcc -Icsrc/includes -I/mnt/data/conda/envs/deepspeed/lib/python3.8/site-packages/torch/include -I/mnt/data/conda/envs/deepspeed/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/mnt/data/conda/envs/deepspeed/lib/python3.8/site-packages/torch/include/TH -I/mnt/data/conda/envs/deepspeed/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/mnt/data/conda/envs/deepspeed/include/python3.8 -c csrc/random_ltd/pt_binding.cpp -o build/temp.linux-x86_64-cpython-38/csrc/random_ltd/pt_binding.o -O3 -std=c++14 -g -Wno-reorder -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=random_ltd_op -D_GLIBCXX_USE_CXX11_ABI=0
nvcc fatal : Unknown option '-Wno-reorder'
creating build/lib.linux-x86_64-cpython-38
creating build/lib.linux-x86_64-cpython-38/deepspeed
creating build/lib.linux-x86_64-cpython-38/deepspeed/ops
g++ -pthread -B /mnt/data/conda/envs/deepspeed/compiler_compat -Wl,--sysroot=/ -pthread -shared -B /mnt/data/conda/envs/deepspeed/compiler_compat -L/mnt/data/conda/envs/deepspeed/lib -Wl,-rpath=/mnt/data/conda/envs/deepspeed/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-cpython-38/csrc/utils/flatten_unflatten.o -L/mnt/data/conda/envs/deepspeed/lib/python3.8/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-cpython-38/deepspeed/ops/utils_op.cpython-38-x86_64-linux-gnu.so
/mnt/data/conda/envs/deepspeed/compiler_compat/ld: build/temp.linux-x86_64-cpython-38/csrc/utils/flatten_unflatten.o: relocation R_X86_64_TPOFF32 against hidden symbol `_ZZN8pybind116handle15inc_ref_counterEmE7counter' can not be used when making a shared object
/mnt/data/conda/envs/deepspeed/compiler_compat/ld: failed to set dynamic section sizes: bad value
collect2: error: ld returned 1 exit status
and
g++ -pthread -B /mnt/data/conda/envs/deepspeed/compiler_compat -Wl,--sysroot=/ -pthread -shared -B /mnt/data/conda/envs/deepspeed/compiler_compat -L/mnt/data/conda/envs/deepspeed/lib -Wl,-rpath=/mnt/data/conda/envs/deepspeed/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-cpython-38/csrc/transformer/cublas_wrappers.o build/temp.linux-x86_64-cpython-38/csrc/transformer/dropout_kernels.o build/temp.linux-x86_64-cpython-38/csrc/transformer/ds_transformer_cuda.o build/temp.linux-x86_64-cpython-38/csrc/transformer/gelu_kernels.o build/temp.linux-x86_64-cpython-38/csrc/transformer/general_kernels.o build/temp.linux-x86_64-cpython-38/csrc/transformer/normalize_kernels.o build/temp.linux-x86_64-cpython-38/csrc/transformer/softmax_kernels.o build/temp.linux-x86_64-cpython-38/csrc/transformer/transform_kernels.o -L/mnt/data/conda/envs/deepspeed/lib/python3.8/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda_cu -ltorch_cuda_cpp -o build/lib.linux-x86_64-cpython-38/deepspeed/ops/transformer/stochastic_transformer_op.cpython-38-x86_64-linux-gnu.so
error: command '/usr/local/cuda/bin/nvcc' failed with exit code 1
To Reproduce
- Create a env using conda with python=3.8 and activate
- Install pytorch 1.13.1
- Because system cmake version is 3.10 so I pip install cmake==3.26.3 and I soft link the system cmake to this one.
- Install libaio
apt install ninja-build- Install triton using
pip install triton==1.0.0 - run
DS_BUILD_OPS=1 pip install deepspeed --global-option="build_ext" --global-option="-j8" - Got error:
fatal error: cuda_profiler_api.h: No such file or directory - Check issue and see #2682 so I do as this reply:
export PATH=/usr/local/cuda/bin:$PATH - run
DS_BUILD_OPS=1 pip install deepspeed --global-option="build_ext" --global-option="-j8" - Error as mentioned above
Expected behavior Successfully installed.
ds_report output N/A
Screenshots
For ninja:
For deepspeed:

System info (please complete the following information):
- OS: Ubuntu 18.04
- GPU count and types: one machine with one A100
- (if applicable) what DeepSpeed-MII version are you using
- (if applicable) Hugging Face Transformers/Accelerate/etc. versions
- Python version: 3.8.16
- Torch 1.13.1
- cuda 11.2
- cmake 3.26.3
- triton 1.0.0
Docker context Are you using a specific docker image that you can share? Wasn't using a docker. Additional context Add any other context about the problem here.
I solved this by simply removed additional options, i.e. DS_BUILD_OPS=1 pip install deepspeed
I solved this by simply removed additional options, i.e.
DS_BUILD_OPS=1 pip install deepspeed
This is what deepspeed ReadMe suggests.