ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]: CUDA out of memory.

Open Xiaozl11 opened this issue 1 year ago • 10 comments

🐛 Describe the bug

在实行Chat下example里的train_sft_sh脚本时候,会提示GPU内存不足 详细报错: CUDA out of memory. Tried to allocate 25.10 GiB (GPU 0; 23.99 GiB total capacity; 75.46 GiB already allocated; 0 bytes free; 75.58 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

我的显卡 24G 4090

Environment

我的显卡 24G 4090

Xiaozl11 avatar Oct 24 '23 02:10 Xiaozl11

您是用的哪个策略啊?用gemini auto策略试试呢?

flybird11111 avatar Oct 24 '23 02:10 flybird11111

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Which strategy did you use? How about trying gemini auto strategy?

Issues-translate-bot avatar Oct 24 '23 02:10 Issues-translate-bot

您是用的哪个策略啊?用gemini auto策略试试呢?

使用的colossalai_zero2策略,我把train_sft.py中的choice中的策略都试了一遍,包括colossalai_gemini,还是出现相同的问题。

Xiaozl11 avatar Oct 24 '23 07:10 Xiaozl11

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Which strategy did you use? How about trying gemini auto strategy?

Using the colossalai_zero2 strategy, I tried all the strategies in the choice in train_sft.py, including colossalai_gemini, but the same problem still occurred.

Issues-translate-bot avatar Oct 24 '23 07:10 Issues-translate-bot

您的模型是多大的呢?

flybird11111 avatar Oct 24 '23 07:10 flybird11111

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


How big is your model?

Issues-translate-bot avatar Oct 24 '23 07:10 Issues-translate-bot

您的模型是多大的呢? 使用的llama2-7B模型

Xiaozl11 avatar Oct 24 '23 08:10 Xiaozl11

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


How big is your model? llama2-7B model used

Issues-translate-bot avatar Oct 24 '23 08:10 Issues-translate-bot

您好,用1张24G的4090微调llama2-7B的话,可以尝试一下使用GeminiPlugin,将placement_policy设置为static, offload_optim_fracoffload_param_frac这两个参数调大(大到不会OOM为止)。

使用LowLevelZeroPlugin,把cpu_offload设置为True,或许也可以。

Fridge003 avatar Oct 25 '23 02:10 Fridge003

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Hello, if you use a 24G 4090 to fine-tune llama2-7B, you can try using GeminiPlugin, set placement_policy to static, and increase the two parameters offload_optim_frac and offload_param_frac (not too large) until OOM occurs).

Using LowLevelZeroPlugin and setting cpu_offload to True may also work.

Issues-translate-bot avatar Oct 25 '23 02:10 Issues-translate-bot