ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]: ModuleNotFoundError: No module named 'colossalai.kernel.op_builder'

Open lxing532 opened this issue 1 year ago • 1 comments

🐛 Describe the bug

I was trying to run: torchrun --standalone --nproc_per_node=2 train_dummy.py --strategy colossalai_zero2 under applications/Chat/examples, and got this error. I tried possible solutions mentioned in other previous relevant bug postings but unfortunately none of them works.

Environment

Python: 3.7.10 Pytorch Version: 1.12.1 Cuda toolkit Version: 11.3

lxing532 avatar Apr 19 '23 05:04 lxing532

Hi, may I see your colossalai check -i output?

JThh avatar Apr 20 '23 10:04 JThh

Hi, may I see your colossalai check -i output?

image I encounter the same error,here is my colossalai check -i output

Ke51n avatar May 05 '23 02:05 Ke51n

I installed by pip install colossalai ,but when i run the example I encounter a problem “ImportError: cannot import name 'zero_model_wrapper' from 'colossalai.zero'” so I installed by git clone https://github.com/hpcaitech/ColossalAI.git cd ColossalAI pip install .

but encountered the above bug,no module named colossalai.kernel.op_builder

Ke51n avatar May 06 '23 03:05 Ke51n

  1. Versioning: Make sure pytorch is installed according to the cuda version; 2) Run conda install -c conda-forge gcc=9.5.0 gxx=9.5.0; 3) CUDA_EXT=1 pip install colossalai; 4) ModuleNotFoundError: No module named 'invoke.vendor.decorator', pip install decorator

yifanhunter avatar Jun 01 '23 09:06 yifanhunter