mlmm-evaluation Few Shot configuration

Few Shot configuration

Open Nkluge-correa opened this issue 6 months ago • 0 comments

Hello!

Is there a way to control how many examples are used to evaluate the models? Also, how are the evaluations currently set up? Are all benchmarks (ARC, MMLU, HellaSwag) running in a zero-shot fashion? If not, what is the configuration used?

Aug 09 '24 12:08 Nkluge-correa

mlmm-evaluation mlmm-evaluation copied to clipboard

Few Shot configuration

mlmm-evaluation
mlmm-evaluation copied to clipboard