ColossalAI
ColossalAI copied to clipboard
[BUG]: 运行 train_prompts.py prompts.csv --strategy naive 失败
🐛 Describe the bug
I download prompts.csv and run:
python train_prompts.py prompts.csv --strategy naive --lora_rank 16
Traceback (most recent call last):
File "train_prompts.py", line 122, in
data:image/s3,"s3://crabby-images/99a08/99a0892d1c16bab23b805b4ccf4198d5b58469d6" alt="image"
RuntimeError: CUDA error: out of memory 我使用 A5000 GPU 24GB 显存,训练使用GPU内存需要多少?是否我运行参数有问题?请求大家帮助!
Environment
使用ChatGPT0.1.0版本
Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑🤝🧑👫🧑🏿🤝🧑🏻👩🏾🤝👨🏿👬🏿
Title: [BUG]: Failed to run train_prompts.py prompts.csv --strategy naive
data:image/s3,"s3://crabby-images/e5502/e550210fc45246a4a1fc6693dd2a013c72da069f" alt="image"
Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑🤝🧑👫🧑🏿🤝🧑🏻👩🏾🤝👨🏿👬🏿
data:image/s3,"s3://crabby-images/e5502/e550210fc45246a4a1fc6693dd2a013c72da069f" alt="image"
Thanks for your feedback. We suggest you to use colossalai_zero2 strategy to train instead of naive which may save GPU mem for you. You can also use train_prompt.sh for training demo.
Thank you!I'll try again!
@ht-zhou Could not find 'RANK' in the torch environment 这个策略需要多少的显存?我试了一下,发现了额这个报错
Traceback (most recent call last):
File "train_prompts.py", line 122, in
Hi @JThh , can you help to answer this question?
Rank error may be fixed by executing as torchrun rather than directly as python.
torchrun --standalone --nproc_per_node=2 train_prompts.py <Insert args here>
@chingfeng2021, has the issue been resolved?
We have updated a lot. Please check the latest code. This issue was closed due to inactivity. Thanks.