lm-evaluation-harness Add KoCommonGEN v2 benchmark

Add KoCommonGEN v2 benchmark

Open metterian opened this issue 1 year ago • 4 comments

Description: This PR adds support for the KoCommonGEN v2 benchmark, a new dataset for evaluating Korean commonsense reasoning in large language models.

Changes:

Added KoCommonGEN v2 task definition
Updated task list to include ko_commongen_v2
Added citation information for the benchmark

KoCommonGEN v2 Details:

Paper: "KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models"
Accepted to ACL 2024-Findings
GitHub: https://github.com/J-Seo/KoCommonGEN-V2
Dataset: https://huggingface.co/datasets/nlpai-lab/ko_commongen_v2

This benchmark provides a valuable resource for evaluating Korean language models on commonsense reasoning tasks. Adding it to our evaluation suite will help broaden our coverage of multilingual NLP capabilities.

Please review and let me know if any changes or additional information is needed.

Aug 12 '24 07:08 metterian

All committers have signed the CLA.

Aug 12 '24 07:08 CLAassistant

Hi @metterian , just following up to see if you'd be able to make these final few changes so we can merge this task! If not we'll try to get to them ourselves.

Note also that we'd ideally have an entry in https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/README.md describing the task as well, so users know about your task!

Aug 28 '24 13:08 haileyschoelkopf

Hi @metterian. Sorry for the delay. Just need you to sign the CLA, if you agree and we can merge this

Jan 28 '25 19:01 baberabb

@metterian bumping!

Mar 28 '25 12:03 baberabb

lm-evaluation-harness lm-evaluation-harness copied to clipboard

Add KoCommonGEN v2 benchmark

lm-evaluation-harness
lm-evaluation-harness copied to clipboard