FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

What's the version of gpt-4 of repo-provided ref answers

Open UbeCc opened this issue 9 months ago • 1 comments

Hi!

I'm using the default configuration in llm_judge repo. But when I use openai apis from different mirror, I got significantly different result. I use llama3-8b, and the score from two api-providers are 8.038760 and 6.827044.

And that result raises a question: What is the gpt-4 version at reference_answer/gpt-4.jsonl when the repo releases?

UbeCc avatar May 01 '24 02:05 UbeCc

Same problem. I also got different scores from two api-providers on the same inference result generated by MiniCPM-2B-DPO-BF16. One of them is 7.090625, and the other is 6.025. Did you find the reason?

endxxxx avatar May 20 '24 11:05 endxxxx

Same problem. I also got different scores from two api-providers on the same inference result generated by MiniCPM-2B-DPO-BF16. One of them is 7.090625, and the other is 6.025. Did you find the reason?

No, but I guess the different api-providers offer different versions of models. It has nothing to do with FastChat.

UbeCc avatar May 23 '24 08:05 UbeCc