[EVAL] Add TUMLU benchmark
Hello! We just released the benchmark for Turkic languages. Does it make sense if I add it to lighteval?
Evaluation short description
-
Why is this evaluation interesting? First native-language MMLU benchmark for low-resource Turkic languages.
-
How is it used in the community? Just released, MC high-school exam questions
Evaluation metadata
Provide all available
- Paper url: https://arxiv.org/abs/2502.11020
- Github url: https://github.com/ceferisbarov/TUMLU
- Dataset url:
cc @hynky1999 could interest you I feel!
Is the dataset already on Hugging Face?
@clefourrier Not really (in gated repos), but everything is in github already.
Gated sounds fine, can you share the path?
Hi, I think it would be very nice addition, we already have TurkishMMLU (which I think is is also part of your dataset right ?)
- See https://github.com/huggingface/lighteval/blob/main/src/lighteval/tasks/multilingual/tasks.py#L2133
To add it we would need following:
- Have translation literals for the languages you want to add: (https://github.com/huggingface/lighteval/blob/main/src/lighteval/tasks/multilingual/tasks.py#L2133)
- Add the dataset to hub
- Replace the TurkishMMLU with your dataset
Do you think you could do that? cc @gaydmi
@gaydmi Thank you for bringing this up!
@hynky1999 I have a question. Our dataset can be split into subsets in three ways: (a) make each language a subset, (b) make each subject a subset, (c) make each language-subject combination a subset. Which one would you suggest? I could not find any similar examples in the repo.
@hynky1999 Hi, yes, working on it! @ceferisbarov I personally think option (c) is the best, so we could just add new languages with their tasks. Like in here: https://github.com/huggingface/lighteval/blob/main/src/lighteval/tasks/multilingual/tasks.py#L2617
I would say ideally use subset for languages and then add column to identify the actuall task subset. You can then use hf_filter arg on task
Both options sound good to me. I have added the dataset to Hugging Face:
https://huggingface.co/datasets/jafarisbarov/TUMLU-mini
@gaydmi let me know if I can help in any other way.
Awesome, cc @gaydmi happy to review the PR once ready