alignment-handbook
alignment-handbook copied to clipboard
Why zephyr-7b-dpo-lora is finetuned from mistralai/Mistral-7B-v0.1 instead of zepher-7b-sft model?
There is a misalignment between zephyr-7b-dpo-lora and zephyr-7b-dpo-full. The former one is finetuned from mistralai/Mistral-7B-v0.1. The latter is finetuned from zephyr-7b-dpo-full.
I wonder what causes this misalignment ?
Also, have you benchmarked performance improvement of the lora finetunning script? In my experiment, lora finetunning seems do not provide any performance improvement compared with the base model on MT-bench. I think maybe some parameters are incorrect.
I found the same issue here
In general, we observe better performance with the full finetune. Although we did not perform a full hyperparameter scan on the lora configs so I am sure improvements can be made there.
As for the misalignment, I am not sure what you are referring to. The dpo-lora config fine-tunes on top of the sft-lora model. Can you provide some more detail?