lm-evaluation-harness
lm-evaluation-harness copied to clipboard
Add new benchmark: Basque bench
BasqueBench is a benchmark for tasks in Basque that cover several evaluation areas. The datasets consist of professional translations of relevant English datasets and newly created datasets in Basque. The README.md contains detailed information on all the tasks included in the benchmark.
Thanks very much for this PR. Just some small issues I identified and if you could also run
pre-commit run --all-files
to fix the linting issues
These are the changes done:
- Added the benchmark info in lm_eval/tasks/README.md
- Replaced "-" by "_" in the create_files script in flores_eu and added weight_by_size: false
- Run linters
- Remove grouping in mgsm and copa tasks (they were pointing to pre-existing benchmarks) With these, it should all be fine now. Thank you!