ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]: ColossalAI/applications/ChatGPT/examples路径下代码的colossalai模型并行策略报错。

Open Qian0733 opened this issue 1 year ago • 4 comments

🐛 Describe the bug

以路径下train_prompts.py代码为例。 单卡可以运行python train_prompts.py --model opt --pretrain opt-125m代码,也支持更大模型的训练; 但是利用官方自带shell脚本运行torchrun --standalone --nproc_per_node=2 train_prompts.py prompts.csv --model opt --pretrain opt-125m/ --strategy colossalai_gemini --train_batch_size 1报错。 报错信息如下:

单机多卡报错

请问在使用脚本的时候是需要修改什么参数吗?可以如何定位并解决问题

Environment

CUDA = 11.2 Python = 3.7.3 PyTorch = 1.13.1

Qian0733 avatar Mar 01 '23 06:03 Qian0733

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Title: [BUG]: The colossalai model parallel strategy of the code under the ColossalAI/applications/ChatGPT/examples path reports an error.

Issues-translate-bot avatar Mar 01 '23 06:03 Issues-translate-bot

Hi @Qian0733 Thank you for your feedback. But we can't reproduce your bug. It seems like there's something wrong with your env. Can you give us more information about your bug? Thank you.(And we suggest you to update to our newest code to have a try.)

ht-zhou avatar Mar 03 '23 09:03 ht-zhou

我也遇到了相同的问题,请问你解决了吗? 另外,strategy=ddp的时候可以正常运行,但是多个gpu和单个gpu运行的耗时以及显存占用却是一样的!请问你遇到过这个问题嘛?

zhaochs1995 avatar Mar 10 '23 10:03 zhaochs1995

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


I also encountered the same problem, did you solve it? In addition, when strategy=ddp, it can run normally, but the time-consuming and memory usage of multi-card and single-card operation are the same! May I ask you have encountered this problem?

Issues-translate-bot avatar Mar 10 '23 10:03 Issues-translate-bot

我遇到执行train_sft.py的时候遇到了同样的问题

GXKIM avatar Apr 04 '23 06:04 GXKIM

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


I had the same problem when executing train_sft.py

Issues-translate-bot avatar Apr 04 '23 06:04 Issues-translate-bot

We have updated a lot. Please check the latest code. This issue was closed due to inactivity. Thanks.

binmakeswell avatar Apr 27 '23 07:04 binmakeswell