ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]: memory not decrease

Open songjin321 opened this issue 3 years ago • 3 comments

🐛 Describe the bug

image

I use the train_pokemon.yaml file to train the model on my device with 3090ti gpu, then I got an OOM error. so I set the batch size to 1, but the gpu used is 18G, there are must some thing wrong....

Environment

No response

songjin321 avatar Nov 10 '22 12:11 songjin321

Hi @songjin321 Thank you for your feedback. We will try to reproduce your issue and fix it soon.

binmakeswell avatar Nov 15 '22 05:11 binmakeswell

you can try train_colossalai.yaml

Fazziekey avatar Nov 15 '22 08:11 Fazziekey

the train_pokemon.yaml is not completed

Fazziekey avatar Nov 15 '22 08:11 Fazziekey