deita reproduce mt-bench score

reproduce mt-bench score

Open bpucla opened this issue 11 months ago • 1 comments

Dear Authors,

Thank you for you great work! I'm trying to reproduce the reported MT-Bench scores with the released code and data.

Trying to reproduce: DEITA-7B-v1.0 (6K) --> mt-bench: 7.22 DEITA-7B-v1.0-sft --> mt-bench: 7.32

Data I used: hkust-nlp/deita-6k-v0 hkust-nlp/deita-10k-v0

Code I used: https://github.com/hkust-nlp/deita/blob/main/examples/train/sft.sh

The scores for both 6k and 10k I got are around 7.06 (vs. 7.22, 7.32). The difference seems larger than regular SFT and MT-Bench eval variability.

Any suggestions to resolve the discrepancy would be appreciated.

Thanks!

Mar 21 '24 07:03 bpucla

Hi,

Thank you for your interest! We have indeed noticed some fluctuations during the model training process. One potential solution we recommend is to replicate our development environment by installing the dependencies listed in our requirements.txt file and training the model again.

Furthermore, a key benefit of this data-efficient instruction tuning approach is its viability in re-training models to identify the most optimal one.

If you have any other problems please feel free to contact us

Mar 27 '24 03:03 VPeterV

deita deita copied to clipboard

reproduce mt-bench score

deita
deita copied to clipboard