FastChat
FastChat copied to clipboard
GPT-4-turbo MMLU scores?
Hi all, maybe there's an obvious reason why this can't be done, but it'd be really amazing to have access to the MMLU scores for the GPT-4-turbo models. I'm not sure if it's against the license to do that or something, but it'd be very nice to be able to evaluate their performance against Claude Opus and even against old GPT -4 on a benchmark. This would be for the LmSys leaderboard.