ColossalAI
ColossalAI copied to clipboard
[BUG]: CUDA out of memory.
🐛 Describe the bug
在实行Chat下example里的train_sft_sh脚本时候,会提示GPU内存不足 详细报错: CUDA out of memory. Tried to allocate 25.10 GiB (GPU 0; 23.99 GiB total capacity; 75.46 GiB already allocated; 0 bytes free; 75.58 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
我的显卡 24G 4090
Environment
我的显卡 24G 4090
您是用的哪个策略啊?用gemini auto策略试试呢?
Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑🤝🧑👫🧑🏿🤝🧑🏻👩🏾🤝👨🏿👬🏿
Which strategy did you use? How about trying gemini auto strategy?
您是用的哪个策略啊?用gemini auto策略试试呢?
使用的colossalai_zero2策略,我把train_sft.py中的choice中的策略都试了一遍,包括colossalai_gemini,还是出现相同的问题。
Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑🤝🧑👫🧑🏿🤝🧑🏻👩🏾🤝👨🏿👬🏿
Which strategy did you use? How about trying gemini auto strategy?
Using the colossalai_zero2 strategy, I tried all the strategies in the choice in train_sft.py, including colossalai_gemini, but the same problem still occurred.
您的模型是多大的呢?
Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑🤝🧑👫🧑🏿🤝🧑🏻👩🏾🤝👨🏿👬🏿
How big is your model?
您的模型是多大的呢? 使用的llama2-7B模型
Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑🤝🧑👫🧑🏿🤝🧑🏻👩🏾🤝👨🏿👬🏿
How big is your model? llama2-7B model used
您好,用1张24G的4090微调llama2-7B的话,可以尝试一下使用GeminiPlugin
,将placement_policy
设置为static
, offload_optim_frac
和offload_param_frac
这两个参数调大(大到不会OOM为止)。
使用LowLevelZeroPlugin
,把cpu_offload
设置为True,或许也可以。
Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑🤝🧑👫🧑🏿🤝🧑🏻👩🏾🤝👨🏿👬🏿
Hello, if you use a 24G 4090 to fine-tune llama2-7B, you can try using GeminiPlugin
, set placement_policy
to static
, and increase the two parameters offload_optim_frac
and offload_param_frac
(not too large) until OOM occurs).
Using LowLevelZeroPlugin
and setting cpu_offload
to True may also work.