direct-preference-optimization
direct-preference-optimization copied to clipboard
Is fine tuning with e.g., LORA supported?
It looks like FSDP is a pretty awesome module to distribute the base model, but does this codebase support Lora fine tuning? I think usually what we would like DPO doing is training the adaptor layer not the entire base model. If we can load the model in FSDP way while fine tuning Lora layer only, that would make GPU memory free from even the largest model I guess.
I recently reimplemented DPO with QLoRA on LLaMA-7B model. The model is already available on huggingface: https://huggingface.co/abaheti95/dpo_qlora_hh Here is the respective code for implementation: https://github.com/abaheti95/LoL-RL/blob/main/dpo_qlora_llama_hh.py
I hope this helps.