ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]: Error building extension 'cpu_adam'

Open Tian14267 opened this issue 2 years ago • 3 comments

🐛 Describe the bug

I get those error when use train_sft.sh

[extension] Compiling or loading the JIT-built cpu_adam kernel during runtime now
/root/anaconda3/envs/coati/lib/python3.9/site-packages/torch/utils/cpp_extension.py:365: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++ 4.8.5) may be ABI-incompatible with PyTorch!
Please use a compiler that is ABI-compatible with GCC 5.0 and above.
See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html.

See https://gist.github.com/goldsborough/d466f43e8ffc948ff92de7486c5216d6
for instructions on how to install GCC 5 or higher.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(ABI_INCOMPATIBILITY_WARNING.format(compiler))
Emitting ninja build file /root/.cache/colossalai/torch_extensions/torch1.13_cu11.7/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/root/anaconda3/envs/coati/lib/python3.9/site-packages/colossalai/kernel/cuda_native/csrc/includes -I/usr/local/cuda/include -isystem /root/anaconda3/envs/coati/lib/python3.9/site-packages/torch/include -isystem /root/anaconda3/envs/coati/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/coati/lib/python3.9/site-packages/torch/include/TH -isystem /root/anaconda3/envs/coati/lib/python3.9/site-packages/torch/include/THC -isystem /root/anaconda3/envs/coati/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -std=c++14 -lcudart -lcublas -g -Wno-reorder -fopenmp -march=native -c /root/anaconda3/envs/coati/lib/python3.9/site-packages/colossalai/kernel/cuda_native/csrc/cpu_adam.cpp -o cpu_adam.o 
FAILED: cpu_adam.o 
c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/root/anaconda3/envs/coati/lib/python3.9/site-packages/colossalai/kernel/cuda_native/csrc/includes -I/usr/local/cuda/include -isystem /root/anaconda3/envs/coati/lib/python3.9/site-packages/torch/include -isystem /root/anaconda3/envs/coati/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/coati/lib/python3.9/site-packages/torch/include/TH -isystem /root/anaconda3/envs/coati/lib/python3.9/site-packages/torch/include/THC -isystem /root/anaconda3/envs/coati/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -std=c++14 -lcudart -lcublas -g -Wno-reorder -fopenmp -march=native -c /root/anaconda3/envs/coati/lib/python3.9/site-packages/colossalai/kernel/cuda_native/csrc/cpu_adam.cpp -o cpu_adam.o 
c++: error: unrecognized command line option ‘-std=c++14’
c++: error: unrecognized command line option ‘-std=c++14’
ninja: build stopped: subcommand failed.

image image image

Environment

Linux: CentOS Linux release 7.9.2009 (Core) python: Python 3.9.16 pytorch: 1.13.1 CUDA: 11.2 gcc version 8.3.1 20190311 (Red Hat 8.3.1-3) (GCC)

Tian14267 avatar Mar 31 '23 06:03 Tian14267

Does somebody know how to solve it ?

Tian14267 avatar Mar 31 '23 07:03 Tian14267

This video may help you. https://www.google.com/search?q=Error+building+extension+%27cpu_adam%27&oq=Error+building+extension+%27cpu_adam%27&aqs=chrome.0.69i59j69i60j69i61j69i60.6527j0j4&sourceid=chrome&ie=UTF-8#fpstate=ive&vld=cid:4a20d78a,vid:S2mtGxf2ZEs

Captainr22 avatar Apr 03 '23 14:04 Captainr22

Hi @Tian14267 The possible reason is GCC version is too low, or env is not compatible. Please check https://github.com/hpcaitech/ColossalAI#installation Thanks.

binmakeswell avatar Apr 18 '23 10:04 binmakeswell