Yann Dubois

Results 52 comments of Yann Dubois

Todo: - [ ] add a signature in the csv leaderboard so that people can report them in papers and make sure it's comparable. - [ ] print the signature...

I'm closing as I don't think that I will be able to add this feature unfortunately.

This is very surprising indeed. Just to understand, why are you not using the default alpaca_eval 2? i.e. `alpaca_eval evaluate_from_model --model_configs 'mistral-7b-orpo'` Is the issue that you don't have access...

My bad @qingquansong , use alpaca_eval evaluate_from_model --model_configs 'mistral-7b-orpo' --annotators_config 'alpaca_eval_gpt4_turbo_fn' which doesn't require logprobs

@hungchiayu1 that's very surprising, what are the two deployment names and how do they differ?

@qingquansong are you using the OpenAI API directly? My guess in all the above is that the issue comes from using the wrong models & API deployment. PLease run it...

That's strange... The model that I used to get table 5 (ie 93.6 CIFAR10) is `dissl_resnet50_d8192_e400_m6`, to check that you can reproduce the results?

I saw the PR, it looks great and homogeneity definitely makes sense. Adding AlpacaEval might require a few changes for homogenization though. The pipeline for AlpacaEval at a high level...

Great, to know that there's a place for a corpus level function, I can write a minimal `length_controlled_mean` when the times come. Let me know if you have questions for...

Hey @clefourrier! So the current JudgeOpenAI still seems pretty specialized to MT-bench. E.g. it makes a few assumptions that will not be true for AlpacaEval and more generally for other...