ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]No module named 'colossalai._C.cpu_adam':

Open ifromeast opened this issue 2 years ago • 13 comments

🐛 Describe the bug

GPU: 8*A6000 CUDA Version: 11.7 Python Version: 3.8.10 colossalai Version: 0.2.8

when I run examples/train_sft.sh, the ERROR occurs

Traceback (most recent call last):
  File "/root/alpaca_test/ColossalAI/colossalai/kernel/op_builder/builder.py", line 159, in load
    op_module = self.import_op()
  File "/root/alpaca_test/ColossalAI/colossalai/kernel/op_builder/builder.py", line 110, in import_op
    return importlib.import_module(self.prebuilt_import_path)
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 973, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'colossalai._C.cpu_adam'

Environment

No response

ifromeast avatar Apr 01 '23 13:04 ifromeast

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Title: [BUG]No module named 'colossalai._C.cpu_adam':

Issues-translate-bot avatar Apr 01 '23 13:04 Issues-translate-bot

only colossalai_gemini and colossalai_zero2 strategies have this BUG.

ifromeast avatar Apr 01 '23 13:04 ifromeast

too

emigmo avatar Apr 03 '23 02:04 emigmo

Have you tried installing from source? Or try the command CUDA_EXT=1 pip install colossalai to install the lib?

If you have solved the issue, kindly share your approach for new comers!

JThh avatar Apr 05 '23 22:04 JThh

same problem, can anyone help!!!

janglichao avatar Apr 08 '23 17:04 janglichao

same problem, can anyone help!!!

janglichao avatar Apr 08 '23 17:04 janglichao

install apex firstly, it works

ifromeast avatar Apr 11 '23 14:04 ifromeast

how to solve this issue,I also have this problem,help,thanks

elven2016 avatar Apr 18 '23 13:04 elven2016

how to solve this issue,I also have this problem,help,thanks

just install apex firstly

ifromeast avatar Apr 18 '23 13:04 ifromeast

need the g++ version 5, the centos7.5 default g++ is 4.8.5 .after install apex, the error is gone, but it appears this error: ImportError: cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location) from apex import amp File "/home/elven/.local/lib/python3.9/site-packages/apex/init.py", line 13, in from pyramid.session import UnencryptedCookieSessionFactoryConfig ImportError: cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location)

elven2016 avatar Apr 18 '23 13:04 elven2016

need the g++ version 5, the centos7.5 default g++ is 4.8.5 .after install apex, the error is gone, but it appears this error: ImportError: cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location) from apex import amp File "/home/elven/.local/lib/python3.9/site-packages/apex/init.py", line 13, in from pyramid.session import UnencryptedCookieSessionFactoryConfig ImportError: cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location)

Oh, no! you cannot install apex by pip install apex, but by

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir ./

ifromeast avatar Apr 18 '23 13:04 ifromeast

need the g++ version 5, the centos7.5 default g++ is 4.8.5 .after install apex, the error is gone, but it appears this error: ImportError: cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location) from apex import amp File "/home/elven/.local/lib/python3.9/site-packages/apex/init.py", line 13, in from pyramid.session import UnencryptedCookieSessionFactoryConfig ImportError: cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location)

Oh, no! you cannot install apex by pip install apex, but by

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir ./

the same error,did this method need to install apex before colossai? when i instrall colossai by soure " CUDA_EXT=1 pip install . " it tolds "Your compiler (c++ 4.8.5) may be ABI-incompatible with PyTorch! Please use a compiler that is ABI-compatible with GCC 5.0 and above" only "pip install . "works ,but can not train, after install apex in source , the same error appears

elven2016 avatar Apr 18 '23 14:04 elven2016

need the g++ version 5, the centos7.5 default g++ is 4.8.5 .after install apex, the error is gone, but it appears this error: ImportError: cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location) from apex import amp File "/home/elven/.local/lib/python3.9/site-packages/apex/init.py", line 13, in from pyramid.session import UnencryptedCookieSessionFactoryConfig ImportError: cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location)

Oh, no! you cannot install apex by pip install apex, but by

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir ./

the same error,did this method need to install apex before colossai? when i instrall colossai by soure " CUDA_EXT=1 pip install . " it tolds "Your compiler (c++ 4.8.5) may be ABI-incompatible with PyTorch! Please use a compiler that is ABI-compatible with GCC 5.0 and above" only "pip install . "works ,but can not train, after install apex in source , the same error appears

I have solved this problem,I install scl and install the gcc7 , now it can run ,thanks !

elven2016 avatar Apr 18 '23 15:04 elven2016

Have you tried installing from source? Or try the command CUDA_EXT=1 pip install colossalai to install the lib?

If you have solved the issue, kindly share your approach for new comers!

Exception: [extension] Failed to build PyTorch extension because the detected CUDA version (12.0) mismatches the version that was used to compile PyTorch (11.3).Please make sure you have set the CUDA_HOME correctly and installed the correct PyTorch in https://pytorch.org/get-started/locally/ .

yifanhunter avatar May 30 '23 05:05 yifanhunter

Have you tried installing from source? Or try the command CUDA_EXT=1 pip install colossalai to install the lib? If you have solved the issue, kindly share your approach for new comers!

Exception: [extension] Failed to build PyTorch extension because the detected CUDA version (12.0) mismatches the version that was used to compile PyTorch (11.3).Please make sure you have set the CUDA_HOME correctly and installed the correct PyTorch in https://pytorch.org/get-started/locally/ .

Hi, @yifanhunter It looks like that there is something wrong with you CUDA_HOME, can you try export CUDA_HOME=/usr/local/cuda ?

ifromeast avatar May 31 '23 01:05 ifromeast