ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]: Pipeline Parallel in ChatGPT examples

Open mcc311 opened this issue 1 year ago • 2 comments

🐛 Describe the bug

Hi, does anyone know how to do pipeline parallel in ChatGPT examples? I tried to set config

colossalai.launch_from_torch(config=CONFIG)

After I ran:

$ torchrun --standalone --nproc_per_node=2 train_reward_model.py --pretrain facebook/opt-1.3b --model opt --strategy colossalai_zero2

It came out an error saying I initialized it twice:

RuntimeError: trying to initialize the default process group twice!
                    INFO     colossalai - colossalai - INFO: Distributed environment is initialized, data parallel size: 1,
                             pipeline parallel size: 2, tensor parallel size: 1

Slack Message

Environment

GPU: V100 x 2

mcc311 avatar Mar 12 '23 03:03 mcc311

I made some comments on our slack channel which you may check out.

JThh avatar Mar 12 '23 13:03 JThh

we are supporting pipeline parallel for chatgpt, it may takes times

Fazziekey avatar Mar 14 '23 01:03 Fazziekey

hi @Fazziekey “we are supporting pipeline parallel for chatgpt, it may takes times”
Are there any related examples now?I faced with the same problem :For large models that cannot be loaded within a single GPU,model parameters need to be shard among muti-gpus.

taishiciR avatar Mar 30 '23 06:03 taishiciR

same error, any updates?

evi-Genius avatar Apr 06 '23 03:04 evi-Genius

Hi @evi-Genius @taishiciR As mentioned in Chat example, –strategy ‘colossalai_gemini’ or ‘colossalai_zero2’ is enough for most cases. PP is not supported for Chat currently. It is relatively low on our priority list for open source planning. If you need customized in-depth cooperation or support, please send the details to [email protected] We also have updated a lot. Please check the latest code. This issue was closed due to inactivity. Thanks.

binmakeswell avatar May 05 '23 08:05 binmakeswell