ColossalAI [BUG]: CUDA out of memory.

🐛 Describe the bug

在实行Chat下example里的train_sft_sh脚本时候，会提示GPU内存不足详细报错： CUDA out of memory. Tried to allocate 25.10 GiB (GPU 0; 23.99 GiB total capacity; 75.46 GiB already allocated; 0 bytes free; 75.58 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

我的显卡 24G 4090

Environment

我的显卡 24G 4090

Oct 24 '23 02:10 Xiaozl11

您是用的哪个策略啊？用gemini auto策略试试呢？

Oct 24 '23 02:10 flybird11111

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

Which strategy did you use? How about trying gemini auto strategy?

Oct 24 '23 02:10 Issues-translate-bot

您是用的哪个策略啊？用gemini auto策略试试呢？

使用的colossalai_zero2策略，我把train_sft.py中的choice中的策略都试了一遍，包括colossalai_gemini，还是出现相同的问题。

Oct 24 '23 07:10 Xiaozl11

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

Which strategy did you use? How about trying gemini auto strategy?

Using the colossalai_zero2 strategy, I tried all the strategies in the choice in train_sft.py, including colossalai_gemini, but the same problem still occurred.

Oct 24 '23 07:10 Issues-translate-bot

您的模型是多大的呢？

Oct 24 '23 07:10 flybird11111

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

How big is your model?

Oct 24 '23 07:10 Issues-translate-bot

您的模型是多大的呢？使用的llama2-7B模型

Oct 24 '23 08:10 Xiaozl11

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

How big is your model? llama2-7B model used

Oct 24 '23 08:10 Issues-translate-bot

您好，用1张24G的4090微调llama2-7B的话，可以尝试一下使用GeminiPlugin，将placement_policy设置为static, offload_optim_frac和offload_param_frac这两个参数调大（大到不会OOM为止）。

使用LowLevelZeroPlugin，把cpu_offload设置为True，或许也可以。

Oct 25 '23 02:10 Fridge003

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

Hello, if you use a 24G 4090 to fine-tune llama2-7B, you can try using GeminiPlugin, set placement_policy to static, and increase the two parameters offload_optim_frac and offload_param_frac (not too large) until OOM occurs).

Using LowLevelZeroPlugin and setting cpu_offload to True may also work.

Oct 25 '23 02:10 Issues-translate-bot

ColossalAI ColossalAI copied to clipboard

[BUG]: CUDA out of memory.

🐛 Describe the bug

Environment

ColossalAI
ColossalAI copied to clipboard