ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]: train_sft.py LLAMA-7B colossalai_zero2 OOM

Open TyrionZK opened this issue 1 year ago β€’ 4 comments

πŸ› Describe the bug

I run train_sft.py to finetune LLAMA-7B with colossalai_zero2 batch_size=1 max_len=512,but OOM happen.

theoretically, memory usage of single GPU is about (2+(2+12)/4)*7=38.5G, plus usage for model input.

I don't know why. It is so different from official experiment.

Environment

my device: 4*A800 80G

TyrionZK avatar Oct 12 '23 09:10 TyrionZK

Bot detected the issue body's language is not English, translate it automatically. πŸ‘―πŸ‘­πŸ»πŸ§‘β€πŸ€β€πŸ§‘πŸ‘«πŸ§‘πŸΏβ€πŸ€β€πŸ§‘πŸ»πŸ‘©πŸΎβ€πŸ€β€πŸ‘¨πŸΏπŸ‘¬πŸΏ


Title: [BOG]: Transv.B for the lover SLSZER or

Issues-translate-bot avatar Oct 12 '23 09:10 Issues-translate-bot

Thank you for your feedbackπŸ˜ƒCould you tell me the steps to reproduce it? such as the scripts you run, the procedure to configure the environment, etc. I can reproduce the bug on our A100 machine and help with it.

Orion-Zheng avatar Oct 13 '23 03:10 Orion-Zheng

I just run the offical code https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/examples/train_sft.py

TyrionZK avatar Oct 13 '23 08:10 TyrionZK

Thank you for your concern. We will address this issue as soon as possible.

flybird11111 avatar Oct 24 '23 03:10 flybird11111