FastChat The accuracy issue of MT bench

The accuracy issue of MT bench

Open Luoqiu76 opened this issue 1 year ago • 0 comments

trafficstars

I used the latest code to test the mt bench score of llama-2-chat, and the test result was only about 5.86. However, the official data provided was as high as around 6.3. For my own model, using the same response, the average difference between the two GPT4 scores was surprisingly about 0.2. Additionally, the issue in # 2659 seems to have not been resolved yet, and I am not sure if this is the cause of the error

Jun 07 '24 12:06 Luoqiu76

FastChat FastChat copied to clipboard

The accuracy issue of MT bench

FastChat
FastChat copied to clipboard