direct-preference-optimization icon indicating copy to clipboard operation
direct-preference-optimization copied to clipboard

Is fine tuning with e.g., LORA supported?

Open Emerald01 opened this issue 2 years ago • 1 comments

It looks like FSDP is a pretty awesome module to distribute the base model, but does this codebase support Lora fine tuning? I think usually what we would like DPO doing is training the adaptor layer not the entire base model. If we can load the model in FSDP way while fine tuning Lora layer only, that would make GPU memory free from even the largest model I guess.

Emerald01 avatar Oct 01 '23 03:10 Emerald01

I recently reimplemented DPO with QLoRA on LLaMA-7B model. The model is already available on huggingface: https://huggingface.co/abaheti95/dpo_qlora_hh Here is the respective code for implementation: https://github.com/abaheti95/LoL-RL/blob/main/dpo_qlora_llama_hh.py

I hope this helps.

abaheti95 avatar Oct 04 '23 16:10 abaheti95