[P1] GPU Memory usage issue
Why is the parameter count only 0.03%, yet the memory usage during training reaches over 60 GB, whereas Lora training usually requires only around 17 GB?
Hey @TranscenderNing Thanks for your interests. What is your running arguments for both LoRA and ReFT? I think it depends on batch size per device, and whether you run with other FSDP, floating precision, etc..
@PinetreePantry ping Peter here, we also did a memory profiling for LoRA and ReFT - by using the same parameters, LoRA and ReFT have similar MEM profile while ReFT is lowering the utilization due to less FLOPs are required for performing position-based interventions.
When I was playing around with Reft I also met issues that it sometimes uses high GPU mem. I suggest not to use “padding = a fixed high value” - that will bloat GPU mem up a lot. Maybe try “padding = longest”, “padding = True”, “padding = False” gradually. You may see a reduction of GPU mem usage.