alignment-handbook Output from zephyr-7b-dpo-qlora is weird

Output from zephyr-7b-dpo-qlora is weird

Open ChenDRAG opened this issue 1 year ago • 0 comments

It is said that zephyr-7b-dpo-qlora is finetuned from zephyr-7b-sft-qlora. However, in the adapter config file, the base model is set to mistralai/Mistral-7B-v0.1.

Also, I downloaded the model from https://huggingface.co/alignment-handbook/zephyr-7b-dpo-qlora, and tried to run the MT-bench score. The result is ~4.6 instead of 7+. The responses it generates are repetitive and erroneous. This may be because I used the wrong base model. Could you give me some instructions to test zephyr-7b-dpo-qlora?

p.s. I tried switching the base model to zephyr-7b-sft-qlora, but got the error below:

OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory /home/zephyr-7b-sft-qlora.

Jan 11 '24 09:01 ChenDRAG

alignment-handbook alignment-handbook copied to clipboard

Output from zephyr-7b-dpo-qlora is weird

alignment-handbook
alignment-handbook copied to clipboard