FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

The accuracy issue of MT bench

Open Luoqiu76 opened this issue 1 year ago • 0 comments
trafficstars

I used the latest code to test the mt bench score of llama-2-chat, and the test result was only about 5.86. However, the official data provided was as high as around 6.3. For my own model, using the same response, the average difference between the two GPT4 scores was surprisingly about 0.2. Additionally, the issue in # 2659 seems to have not been resolved yet, and I am not sure if this is the cause of the error

Luoqiu76 avatar Jun 07 '24 12:06 Luoqiu76