lm-evaluation-harness
lm-evaluation-harness copied to clipboard
Add TMLU Benchmark Dataset
This PR adds support for the TMLU ("Measuring Taiwanese Mandarin Language Understanding" by Chen et al) benchmark dataset.
Summary
- Adds a new dataset
tmluwith 2,981 multiple-choice questions across 37 subjects - Uses the TMLU dataset hosted on Hugging Face
- Supports evaluating Taiwanese Mandarin language understanding using log-likelihood multiple choice scoring
- Includes tasks for each TMLU subject, e.g.
tmlu_geography,tmlu_physics, etc. - Enables reproducing results from the Open Taiwan LLM leaderboard
Checklist
- [x] Referenced the original TMLU paper
- [x] Checked the TMLU reference implementation
- [x] Verified the Hugging Face dataset matches the data used in the TMLU paper
Please let me know if you have any other suggestions or feedback on this PR!
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.
:white_check_mark: lintangsutawika
:white_check_mark: adamlin120
:x: Yen-Ting Adam, Lin
Yen-Ting Adam, Lin seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.
@adamlin120 made some adjustments to this PR here https://github.com/adamlin120/lm-evaluation-harness/pull/1
@adamlin120 just need your help to run pre-commit run --all-files and it should be good!