peft
peft copied to clipboard
add accelerate example for DDP and FSDP in sequence classification fo…
…r non-lora case
@pacman100 please help review.
The documentation is not available anymore as the PR was closed or merged.
yes, @pacman100 I see memory decrease in FSDP. I finetune llama 7b with 2-GPUs (RTX8000) using p-tuning, if FSDP is not used, DDP will be crashed because of OOM if training batch size is set to 8, while no crash with fsdp. and if cpu offload is used, the memory will decrease more comparing with no cpu offload in FSDP. but you should apply 352 to use cpu offload in fsdp.