cosmopedia questions about evaluation like MMLU

questions about evaluation like MMLU

Open ftgreat opened this issue 6 months ago • 0 comments

Thank you for sharing.

Some common models like MMLU typically use a 5-shot setting to measure a model's in-context learning capabilities.

Can you explain why MMLU evaluations use a zero-shot plus option content approach?

According to your blog, in this setup, MMLU evaluations are higher than those of QWen1.5B and Phi models, whereas in 5-shot evaluations, the conclusion is the opposite. Is this situation reasonable? Thank you.

Aug 13 '24 09:08 ftgreat

cosmopedia cosmopedia copied to clipboard

questions about evaluation like MMLU

cosmopedia
cosmopedia copied to clipboard