[BUG]: `pip install .` error: identifier "__hsub" is undefined
Is there an existing issue for this bug?
- [X] I have searched the existing issues
๐ Describe the bug
I'm downloading from source:
git clone https://github.com/hpcaitech/ColossalAI.git
cd ColossalAI
# install dependency
pip install -r requirements/requirements.txt
# install colossalai
BUILD_EXT=1 pip install .
when I run the command BUILD_EXT=1 pip install ., it failed:
building 'colossalai._C.scaled_masked_softmax_cuda' extension
creating /path_to_colossalAI/ColossalAI/build/temp.linux-x86_64-cpython-310/path_to_colossalAI/ColossalAI/extensions/pybind/softmax
Emitting ninja build file /path_to_colossalAI/ColossalAI/build/temp.linux-x86_64-cpython-310/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] c++ -MMD -MF /path_to_colossalAI/ColossalAI/build/temp.linux-x86_64-cpython-310/path_to_colossalAI/ColossalAI/extensions/pybind/softmax/scaled_masked_softmax.o.d -pthread -B /home/mahaoke/miniconda3/envs/colossalAI/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /home/mahaoke/miniconda3/envs/colossalAI/include -fPIC -O2 -isystem /home/mahaoke/miniconda3/envs/colossalAI/include -fPIC -I/path_to_colossalAI/ColossalAI/extensions/csrc/ -I/usr/local/cuda/include -I/home/mahaoke/miniconda3/envs/colossalAI/lib/python3.10/site-packages/torch/include -I/home/mahaoke/miniconda3/envs/colossalAI/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/home/mahaoke/miniconda3/envs/colossalAI/lib/python3.10/site-packages/torch/include/TH -I/home/mahaoke/miniconda3/envs/colossalAI/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/mahaoke/miniconda3/envs/colossalAI/include/python3.10 -c -c /path_to_colossalAI/ColossalAI/extensions/pybind/softmax/scaled_masked_softmax.cpp -o /path_to_colossalAI/ColossalAI/build/temp.linux-x86_64-cpython-310/path_to_colossalAI/ColossalAI/extensions/pybind/softmax/scaled_masked_softmax.o -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=scaled_masked_softmax_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++17
[2/2] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /path_to_colossalAI/ColossalAI/build/temp.linux-x86_64-cpython-310/path_to_colossalAI/ColossalAI/extensions/csrc/kernel/cuda/scaled_masked_softmax_kernel.o.d -I/path_to_colossalAI/ColossalAI/extensions/csrc/ -I/usr/local/cuda/include -I/home/mahaoke/miniconda3/envs/colossalAI/lib/python3.10/site-packages/torch/include -I/home/mahaoke/miniconda3/envs/colossalAI/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/home/mahaoke/miniconda3/envs/colossalAI/lib/python3.10/site-packages/torch/include/TH -I/home/mahaoke/miniconda3/envs/colossalAI/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/mahaoke/miniconda3/envs/colossalAI/include/python3.10 -c -c /path_to_colossalAI/ColossalAI/extensions/csrc/kernel/cuda/scaled_masked_softmax_kernel.cu -o /path_to_colossalAI/ColossalAI/build/temp.linux-x86_64-cpython-310/path_to_colossalAI/ColossalAI/extensions/csrc/kernel/cuda/scaled_masked_softmax_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -std=c++14 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -DTHRUST_IGNORE_CUB_VERSION_CHECK -DCOLOSSAL_WITH_CUDA --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=scaled_masked_softmax_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90
FAILED: /path_to_colossalAI/ColossalAI/build/temp.linux-x86_64-cpython-310/path_to_colossalAI/ColossalAI/extensions/csrc/kernel/cuda/scaled_masked_softmax_kernel.o
/usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /path_to_colossalAI/ColossalAI/build/temp.linux-x86_64-cpython-310/path_to_colossalAI/ColossalAI/extensions/csrc/kernel/cuda/scaled_masked_softmax_kernel.o.d -I/path_to_colossalAI/ColossalAI/extensions/csrc/ -I/usr/local/cuda/include -I/home/mahaoke/miniconda3/envs/colossalAI/lib/python3.10/site-packages/torch/include -I/home/mahaoke/miniconda3/envs/colossalAI/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/home/mahaoke/miniconda3/envs/colossalAI/lib/python3.10/site-packages/torch/include/TH -I/home/mahaoke/miniconda3/envs/colossalAI/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/mahaoke/miniconda3/envs/colossalAI/include/python3.10 -c -c /path_to_colossalAI/ColossalAI/extensions/csrc/kernel/cuda/scaled_masked_softmax_kernel.cu -o /path_to_colossalAI/ColossalAI/build/temp.linux-x86_64-cpython-310/path_to_colossalAI/ColossalAI/extensions/csrc/kernel/cuda/scaled_masked_softmax_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -std=c++14 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -DTHRUST_IGNORE_CUB_VERSION_CHECK -DCOLOSSAL_WITH_CUDA --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=scaled_masked_softmax_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90
nvcc warning : incompatible redefinition for option 'std', the last value of this option was used
/path_to_colossalAI/ColossalAI/extensions/csrc/funcs/binary_functor.h(59): error: identifier "__hsub" is undefined
/path_to_colossalAI/ColossalAI/extensions/csrc/funcs/binary_functor.h(68): error: identifier "__hadd2" is undefined
/path_to_colossalAI/ColossalAI/extensions/csrc/funcs/binary_functor.h(112): error: identifier "__hmul" is undefined
/path_to_colossalAI/ColossalAI/extensions/csrc/funcs/binary_functor.h(116): error: identifier "__hmul2" is undefined
4 errors detected in the compilation of "/path_to_colossalAI/ColossalAI/extensions/csrc/kernel/cuda/scaled_masked_softmax_kernel.cu".
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/home/mahaoke/miniconda3/envs/colossalAI/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2107, in _run_ninja_build
subprocess.run(
File "/home/mahaoke/miniconda3/envs/colossalAI/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
Environment
CUDA 11.8 pytorch 2.3.0 python 3.10.14 GPU A800
In addition, pip install -r requirements/requirements.txt will download pytorch for cuda12, so I uninstalled it and installed pytorch for cuda11.8 after running this command
Hi, What is your nvcc version?
Hi, What is your nvcc version?
ๆๆ้ๅๆนไบ๏ผ้ฎ้ข่งฃๅณไบใไฝๆฏ่งฃๅณไปฅๅ้ๅฐไบๆฐ้ฎ้ข,,ๆๆพๅผไบ
Bot detected the issue body's language is not English, translate it automatically. ๐ฏ๐ญ๐ป๐งโ๐คโ๐ง๐ซ๐ง๐ฟโ๐คโ๐ง๐ป๐ฉ๐พโ๐คโ๐จ๐ฟ๐ฌ๐ฟ
Hi, What is your nvcc version?
I changed the image and the problem was solved. But after solving it, I encountered new problems, and I gave up.
In addition,
pip install -r requirements/requirements.txtwill download pytorch for cuda12, so I uninstalled it and installed pytorch for cuda11.8 after running this command
I soved this bug by:
verify ColossalAI/extensions/csrc/funcs/binary_functor.h
line 58๏ผ#if defined(COLOSSAL_WITH_CUDA) && (__CUDA_ARCH__ > 520)
[Attention] Environment CUDA 12.1 pytorch 2.4.0 python 3.9 GPU A800