[BUG]No module named 'colossalai._C.cpu_adam':
🐛 Describe the bug
GPU: 8*A6000 CUDA Version: 11.7 Python Version: 3.8.10 colossalai Version: 0.2.8
when I run examples/train_sft.sh, the ERROR occurs
Traceback (most recent call last):
File "/root/alpaca_test/ColossalAI/colossalai/kernel/op_builder/builder.py", line 159, in load
op_module = self.import_op()
File "/root/alpaca_test/ColossalAI/colossalai/kernel/op_builder/builder.py", line 110, in import_op
return importlib.import_module(self.prebuilt_import_path)
File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 973, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'colossalai._C.cpu_adam'
Environment
No response
Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑🤝🧑👫🧑🏿🤝🧑🏻👩🏾🤝👨🏿👬🏿
Title: [BUG]No module named 'colossalai._C.cpu_adam':
only colossalai_gemini and colossalai_zero2 strategies have this BUG.
too
Have you tried installing from source? Or try the command CUDA_EXT=1 pip install colossalai to install the lib?
If you have solved the issue, kindly share your approach for new comers!
same problem, can anyone help!!!
same problem, can anyone help!!!
install apex firstly, it works
how to solve this issue,I also have this problem,help,thanks
how to solve this issue,I also have this problem,help,thanks
just install apex firstly
need the g++ version 5, the centos7.5 default g++ is 4.8.5 .after install apex, the error is gone, but it appears this error: ImportError: cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location)
from apex import amp
File "/home/elven/.local/lib/python3.9/site-packages/apex/init.py", line 13, in
need the g++ version 5, the centos7.5 default g++ is 4.8.5 .after install apex, the error is gone, but it appears this error: ImportError: cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location) from apex import amp File "/home/elven/.local/lib/python3.9/site-packages/apex/init.py", line 13, in from pyramid.session import UnencryptedCookieSessionFactoryConfig ImportError: cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location)
Oh, no! you cannot install apex by pip install apex, but by
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir ./
need the g++ version 5, the centos7.5 default g++ is 4.8.5 .after install apex, the error is gone, but it appears this error: ImportError: cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location) from apex import amp File "/home/elven/.local/lib/python3.9/site-packages/apex/init.py", line 13, in from pyramid.session import UnencryptedCookieSessionFactoryConfig ImportError: cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location)
Oh, no! you cannot install apex by
pip install apex, but bygit clone https://github.com/NVIDIA/apex cd apex pip install -v --no-cache-dir ./
the same error,did this method need to install apex before colossai? when i instrall colossai by soure " CUDA_EXT=1 pip install . " it tolds "Your compiler (c++ 4.8.5) may be ABI-incompatible with PyTorch! Please use a compiler that is ABI-compatible with GCC 5.0 and above" only "pip install . "works ,but can not train, after install apex in source , the same error appears
need the g++ version 5, the centos7.5 default g++ is 4.8.5 .after install apex, the error is gone, but it appears this error: ImportError: cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location) from apex import amp File "/home/elven/.local/lib/python3.9/site-packages/apex/init.py", line 13, in from pyramid.session import UnencryptedCookieSessionFactoryConfig ImportError: cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location)
Oh, no! you cannot install apex by
pip install apex, but bygit clone https://github.com/NVIDIA/apex cd apex pip install -v --no-cache-dir ./the same error,did this method need to install apex before colossai? when i instrall colossai by soure " CUDA_EXT=1 pip install . " it tolds "Your compiler (c++ 4.8.5) may be ABI-incompatible with PyTorch! Please use a compiler that is ABI-compatible with GCC 5.0 and above" only "pip install . "works ,but can not train, after install apex in source , the same error appears
I have solved this problem,I install scl and install the gcc7 , now it can run ,thanks !
Have you tried installing from source? Or try the command
CUDA_EXT=1 pip install colossalaito install the lib?If you have solved the issue, kindly share your approach for new comers!
Exception: [extension] Failed to build PyTorch extension because the detected CUDA version (12.0) mismatches the version that was used to compile PyTorch (11.3).Please make sure you have set the CUDA_HOME correctly and installed the correct PyTorch in https://pytorch.org/get-started/locally/ .
Have you tried installing from source? Or try the command
CUDA_EXT=1 pip install colossalaito install the lib? If you have solved the issue, kindly share your approach for new comers!Exception: [extension] Failed to build PyTorch extension because the detected CUDA version (12.0) mismatches the version that was used to compile PyTorch (11.3).Please make sure you have set the CUDA_HOME correctly and installed the correct PyTorch in https://pytorch.org/get-started/locally/ .
Hi, @yifanhunter It looks like that there is something wrong with you CUDA_HOME, can you try export CUDA_HOME=/usr/local/cuda ?