lm-evaluation-harness
lm-evaluation-harness copied to clipboard
Add KoCommonGEN v2 benchmark
Description: This PR adds support for the KoCommonGEN v2 benchmark, a new dataset for evaluating Korean commonsense reasoning in large language models.
Changes:
- Added KoCommonGEN v2 task definition
- Updated task list to include ko_commongen_v2
- Added citation information for the benchmark
KoCommonGEN v2 Details:
- Paper: "KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models"
- Accepted to ACL 2024-Findings
- GitHub: https://github.com/J-Seo/KoCommonGEN-V2
- Dataset: https://huggingface.co/datasets/nlpai-lab/ko_commongen_v2
This benchmark provides a valuable resource for evaluating Korean language models on commonsense reasoning tasks. Adding it to our evaluation suite will help broaden our coverage of multilingual NLP capabilities.
Please review and let me know if any changes or additional information is needed.
Hi @metterian , just following up to see if you'd be able to make these final few changes so we can merge this task! If not we'll try to get to them ourselves.
Note also that we'd ideally have an entry in https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/README.md describing the task as well, so users know about your task!
Hi @metterian. Sorry for the delay. Just need you to sign the CLA, if you agree and we can merge this
@metterian bumping!