lm-evaluation-harness icon indicating copy to clipboard operation
lm-evaluation-harness copied to clipboard

Add KoCommonGEN v2 benchmark

Open metterian opened this issue 1 year ago • 4 comments

Description: This PR adds support for the KoCommonGEN v2 benchmark, a new dataset for evaluating Korean commonsense reasoning in large language models.

Changes:

  • Added KoCommonGEN v2 task definition
  • Updated task list to include ko_commongen_v2
  • Added citation information for the benchmark

KoCommonGEN v2 Details:

  • Paper: "KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models"
  • Accepted to ACL 2024-Findings
  • GitHub: https://github.com/J-Seo/KoCommonGEN-V2
  • Dataset: https://huggingface.co/datasets/nlpai-lab/ko_commongen_v2

This benchmark provides a valuable resource for evaluating Korean language models on commonsense reasoning tasks. Adding it to our evaluation suite will help broaden our coverage of multilingual NLP capabilities.

Please review and let me know if any changes or additional information is needed.

metterian avatar Aug 12 '24 07:08 metterian

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Aug 12 '24 07:08 CLAassistant

Hi @metterian , just following up to see if you'd be able to make these final few changes so we can merge this task! If not we'll try to get to them ourselves.

Note also that we'd ideally have an entry in https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/README.md describing the task as well, so users know about your task!

haileyschoelkopf avatar Aug 28 '24 13:08 haileyschoelkopf

Hi @metterian. Sorry for the delay. Just need you to sign the CLA, if you agree and we can merge this

baberabb avatar Jan 28 '25 19:01 baberabb

@metterian bumping!

baberabb avatar Mar 28 '25 12:03 baberabb