ColossalAI [BUG]: ColossalAI/applications/ChatGPT/examples路径下代码的colossalai模型并行策略报错。

[BUG]: ColossalAI/applications/ChatGPT/examples路径下代码的colossalai模型并行策略报错。

Open Qian0733 opened this issue 1 year ago • 4 comments

🐛 Describe the bug

以路径下train_prompts.py代码为例。单卡可以运行python train_prompts.py --model opt --pretrain opt-125m代码，也支持更大模型的训练；但是利用官方自带shell脚本运行torchrun --standalone --nproc_per_node=2 train_prompts.py prompts.csv --model opt --pretrain opt-125m/ --strategy colossalai_gemini --train_batch_size 1报错。报错信息如下：

单机多卡报错

请问在使用脚本的时候是需要修改什么参数吗？可以如何定位并解决问题

Environment

CUDA = 11.2 Python = 3.7.3 PyTorch = 1.13.1

Mar 01 '23 06:03 Qian0733

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

Title: [BUG]: The colossalai model parallel strategy of the code under the ColossalAI/applications/ChatGPT/examples path reports an error.

Mar 01 '23 06:03 Issues-translate-bot

Hi @Qian0733 Thank you for your feedback. But we can't reproduce your bug. It seems like there's something wrong with your env. Can you give us more information about your bug? Thank you.(And we suggest you to update to our newest code to have a try.)

Mar 03 '23 09:03 ht-zhou

我也遇到了相同的问题，请问你解决了吗？另外，strategy=ddp的时候可以正常运行，但是多个gpu和单个gpu运行的耗时以及显存占用却是一样的！请问你遇到过这个问题嘛？

Mar 10 '23 10:03 zhaochs1995

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

I also encountered the same problem, did you solve it? In addition, when strategy=ddp, it can run normally, but the time-consuming and memory usage of multi-card and single-card operation are the same! May I ask you have encountered this problem?

Mar 10 '23 10:03 Issues-translate-bot

我遇到执行train_sft.py的时候遇到了同样的问题

Apr 04 '23 06:04 GXKIM

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

I had the same problem when executing train_sft.py

Apr 04 '23 06:04 Issues-translate-bot

We have updated a lot. Please check the latest code. This issue was closed due to inactivity. Thanks.

Apr 27 '23 07:04 binmakeswell

ColossalAI ColossalAI copied to clipboard

[BUG]: ColossalAI/applications/ChatGPT/examples路径下代码的colossalai模型并行策略报错。

🐛 Describe the bug

Environment

ColossalAI
ColossalAI copied to clipboard