DPO显存分布不均匀
Reminder
- [x] I have read the above rules and searched the existing issues.
System Info
-
llamafactoryversion: 0.9.2.dev0 - Platform: Linux-3.10.0-1160.95.1.el7.x86_64-x86_64-with-glibc2.35
- Python version: 3.10.12
- PyTorch version: 2.6.0+cu124 (GPU)
- Transformers version: 4.48.3
- Datasets version: 3.1.0
- Accelerate version: 1.0.1
- PEFT version: 0.12.0
- TRL version: 0.9.6
- GPU type: NVIDIA A800 80GB PCIe
- DeepSpeed version: 0.15.1
- vLLM version: 0.8.2
Reproduction
14B模型,单机8*A800,全参微调,batch size只能1,使用ds_z3_offload,目前遇到的问题
1、训到一定步数就会崩OOM
2、但sft时,使用ds_z3,可以batch_size开到4
Others
No response
一样的问题,Qwen2.5VL-7B-Instruct是一个lora进行SFT,到一定步数就突然out of memory, batch size和gradient_accumulation_steps都只能设置为1,8卡 A800 显存80G,但是训练的时候一个GPU突然显存占比很大就out of memory了: GPU 5 Memory Allocated (%) 99.98500021922756 GPU 6 Memory Allocated (%) 78.07762809620475 GPU 2 Memory Allocated (%) 62.247705610456464 GPU 3 Memory Allocated (%) 55.404728700119 GPU 1 Memory Allocated (%) 54.42012770582584 GPU 0 Memory Allocated (%) 53.56598634327652 GPU 7 Memory Allocated (%) 53.100762373472996 GPU 4 Memory Allocated (%) 51.39001814588864
same problem,大佬们怎么解决的,我的版本是0.9.4.dev0
same problem
@hiyouga 大佬看看吧