ColossalAI
ColossalAI copied to clipboard
[BUG]: Pipeline Parallel in ChatGPT examples
🐛 Describe the bug
Hi, does anyone know how to do pipeline parallel in ChatGPT examples? I tried to set config
colossalai.launch_from_torch(config=CONFIG)
After I ran:
$ torchrun --standalone --nproc_per_node=2 train_reward_model.py --pretrain facebook/opt-1.3b --model opt --strategy colossalai_zero2
It came out an error saying I initialized it twice:
RuntimeError: trying to initialize the default process group twice!
INFO colossalai - colossalai - INFO: Distributed environment is initialized, data parallel size: 1,
pipeline parallel size: 2, tensor parallel size: 1
Environment
GPU: V100 x 2
I made some comments on our slack channel which you may check out.
we are supporting pipeline parallel for chatgpt, it may takes times
hi @Fazziekey “we are supporting pipeline parallel for chatgpt, it may takes times”
Are there any related examples now?I faced with the same problem :For large models that cannot be loaded within a single GPU,model parameters need to be shard among muti-gpus.
same error, any updates?
Hi @evi-Genius @taishiciR As mentioned in Chat example, –strategy ‘colossalai_gemini’ or ‘colossalai_zero2’ is enough for most cases. PP is not supported for Chat currently. It is relatively low on our priority list for open source planning. If you need customized in-depth cooperation or support, please send the details to [email protected] We also have updated a lot. Please check the latest code. This issue was closed due to inactivity. Thanks.