ColossalAI
ColossalAI copied to clipboard
[BUG]: train_sft.py LLAMA-7B colossalai_zero2 OOM
π Describe the bug
I run train_sft.py to finetune LLAMA-7B with colossalai_zero2 batch_size=1 max_len=512οΌbut OOM happen.
theoretically, memory usage of single GPU is about (2+(2+12)/4)*7=38.5G, plus usage for model input.
I don't know why. It is so different from official experiment.
Environment
my device: 4*A800 80G
Bot detected the issue body's language is not English, translate it automatically. π―ππ»π§βπ€βπ§π«π§πΏβπ€βπ§π»π©πΎβπ€βπ¨πΏπ¬πΏ
Title: [BOG]: Transv.B for the lover SLSZER or
Thank you for your feedbackπCould you tell me the steps to reproduce it? such as the scripts you run, the procedure to configure the environment, etc. I can reproduce the bug on our A100 machine and help with it.
I just run the offical code https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/examples/train_sft.py
Thank you for your concern. We will address this issue as soon as possible.