lm-evaluation-harness icon indicating copy to clipboard operation
lm-evaluation-harness copied to clipboard

Add TMLU Benchmark Dataset

Open adamlin120 opened this issue 1 year ago • 1 comments

This PR adds support for the TMLU ("Measuring Taiwanese Mandarin Language Understanding" by Chen et al) benchmark dataset.

Summary

  • Adds a new dataset tmlu with 2,981 multiple-choice questions across 37 subjects
  • Uses the TMLU dataset hosted on Hugging Face
  • Supports evaluating Taiwanese Mandarin language understanding using log-likelihood multiple choice scoring
  • Includes tasks for each TMLU subject, e.g. tmlu_geography, tmlu_physics, etc.
  • Enables reproducing results from the Open Taiwan LLM leaderboard

Checklist

Please let me know if you have any other suggestions or feedback on this PR!

adamlin120 avatar Jul 12 '24 13:07 adamlin120

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

:white_check_mark: lintangsutawika
:white_check_mark: adamlin120
:x: Yen-Ting Adam, Lin


Yen-Ting Adam, Lin seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Jul 12 '24 13:07 CLAassistant

@adamlin120 made some adjustments to this PR here https://github.com/adamlin120/lm-evaluation-harness/pull/1

lintangsutawika avatar Aug 15 '24 17:08 lintangsutawika

@adamlin120 just need your help to run pre-commit run --all-files and it should be good!

lintangsutawika avatar Aug 19 '24 15:08 lintangsutawika