[P1] GPU Memory usage issue

Open TranscenderNing opened this issue 1 year ago • 2 comments

Why is the parameter count only 0.03%, yet the memory usage during training reaches over 60 GB, whereas Lora training usually requires only around 17 GB?

Sep 12 '24 01:09 TranscenderNing

Hey @TranscenderNing Thanks for your interests. What is your running arguments for both LoRA and ReFT? I think it depends on batch size per device, and whether you run with other FSDP, floating precision, etc..

@PinetreePantry ping Peter here, we also did a memory profiling for LoRA and ReFT - by using the same parameters, LoRA and ReFT have similar MEM profile while ReFT is lowering the utilization due to less FLOPs are required for performing position-based interventions.

Sep 12 '24 16:09 frankaging

When I was playing around with Reft I also met issues that it sometimes uses high GPU mem. I suggest not to use “padding = a fixed high value” - that will bloat GPU mem up a lot. Maybe try “padding = longest”, “padding = True”, “padding = False” gradually. You may see a reduction of GPU mem usage.

Sep 13 '24 18:09 PinetreePantry