ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]: ModuleNotFoundError: No module named 'colossalai.nn.optimizer.zero_optimizer'

Open heya5 opened this issue 3 years ago • 2 comments

🐛 Describe the bug

I install colossalAI with the command pip install colossalai==0.1.11rc3+torch1.12cu11.3 -f https://release.colossalai.org But I get an error when follow https://github.com/hpcaitech/ColossalAI/tree/main/examples/tutorial#-run-opt-finetuning-and-inference, I just run bash ./run_clm_synthetic.sh and get an error as follows:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/he.yan/ColossalAI/examples/tutorial/opt/opt/run_clm.py:46 in <module>                      │
│                                                                                                  │
│    43 from colossalai.core import global_context as gpc                                          │
│    44 from colossalai.logging import disable_existing_loggers, get_dist_logger                   │
│    45 from colossalai.nn.optimizer import HybridAdam                                             │
│ ❱  46 from colossalai.nn.optimizer.zero_optimizer import ZeroOptimizer                           │
│    47 from colossalai.nn.parallel import ZeroDDP                                                 │
│    48 from colossalai.tensor import ProcessGroup                                                 │
│    49 from colossalai.utils import get_current_device, get_dataloader                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ModuleNotFoundError: No module named 'colossalai.nn.optimizer.zero_optimizer'

Environment

Python 3.8.15 torch1.12cu11.3

heya5 avatar Nov 21 '22 08:11 heya5

Hi, @heya5 , that's kind of strange, let me try to reproduce this error.

FrankLeeeee avatar Nov 22 '22 09:11 FrankLeeeee

I think your latest examples import colossalai.nn.optimizer.zero_optimizer, but your code doesn't have it... By the way, may I ask if the opt model in your library implements model parallelism (tensor parallelim or pipeline parallelism)?

heya5 avatar Nov 22 '22 17:11 heya5

Yes, the release seems to have some problems. I am initiating a new release, you should expect to download the correct version by the end of today. Meanwhile, the opt model does not have tensor parallelism and pipeline parallelism as it is implemented by huggingface.

FrankLeeeee avatar Nov 23 '22 07:11 FrankLeeeee

Hi @heya5 , a new patch has been released, you can download the v0.1.11rc4 version from our website https://www.colossalai.org/download. I have tested the tutorial and it worked fine. Let me know if you encounter further issues.

# example
pip install colossalai==0.1.11rc4+torch1.12cu11.3 -f https://release.colossalai.org

FrankLeeeee avatar Nov 24 '22 01:11 FrankLeeeee