alignment-handbook icon indicating copy to clipboard operation
alignment-handbook copied to clipboard

Why zephyr-7b-dpo-lora is finetuned from mistralai/Mistral-7B-v0.1 instead of zepher-7b-sft model?

Open ChenDRAG opened this issue 1 year ago • 2 comments

There is a misalignment between zephyr-7b-dpo-lora and zephyr-7b-dpo-full. The former one is finetuned from mistralai/Mistral-7B-v0.1. The latter is finetuned from zephyr-7b-dpo-full.

I wonder what causes this misalignment ?

Also, have you benchmarked performance improvement of the lora finetunning script? In my experiment, lora finetunning seems do not provide any performance improvement compared with the base model on MT-bench. I think maybe some parameters are incorrect.

ChenDRAG avatar Nov 17 '23 18:11 ChenDRAG

I found the same issue here

JiuhaiChen avatar Nov 19 '23 15:11 JiuhaiChen

In general, we observe better performance with the full finetune. Although we did not perform a full hyperparameter scan on the lora configs so I am sure improvements can be made there.

As for the misalignment, I am not sure what you are referring to. The dpo-lora config fine-tunes on top of the sft-lora model. Can you provide some more detail?

edbeeching avatar Nov 21 '23 11:11 edbeeching