deita
deita copied to clipboard
reproduce mt-bench score
Dear Authors,
Thank you for you great work! I'm trying to reproduce the reported MT-Bench scores with the released code and data.
Trying to reproduce:
DEITA-7B-v1.0 (6K) --> mt-bench: 7.22
DEITA-7B-v1.0-sft --> mt-bench: 7.32
Data I used: hkust-nlp/deita-6k-v0 hkust-nlp/deita-10k-v0
Code I used: https://github.com/hkust-nlp/deita/blob/main/examples/train/sft.sh
The scores for both 6k and 10k I got are around 7.06
(vs. 7.22
, 7.32
). The difference seems larger than regular SFT and MT-Bench eval variability.
Any suggestions to resolve the discrepancy would be appreciated.
Thanks!