lighteval icon indicating copy to clipboard operation
lighteval copied to clipboard

[EVAL] Add TUMLU benchmark

Open gaydmi opened this issue 10 months ago • 10 comments

Hello! We just released the benchmark for Turkic languages. Does it make sense if I add it to lighteval?

Evaluation short description

  • Why is this evaluation interesting? First native-language MMLU benchmark for low-resource Turkic languages.

  • How is it used in the community? Just released, MC high-school exam questions

Evaluation metadata

Provide all available

  • Paper url: https://arxiv.org/abs/2502.11020
  • Github url: https://github.com/ceferisbarov/TUMLU
  • Dataset url:

gaydmi avatar Feb 19 '25 15:02 gaydmi

cc @hynky1999 could interest you I feel!

clefourrier avatar Feb 19 '25 16:02 clefourrier

Is the dataset already on Hugging Face?

clefourrier avatar Feb 19 '25 16:02 clefourrier

@clefourrier Not really (in gated repos), but everything is in github already.

gaydmi avatar Feb 19 '25 16:02 gaydmi

Gated sounds fine, can you share the path?

clefourrier avatar Feb 19 '25 16:02 clefourrier

Hi, I think it would be very nice addition, we already have TurkishMMLU (which I think is is also part of your dataset right ?)

  • See https://github.com/huggingface/lighteval/blob/main/src/lighteval/tasks/multilingual/tasks.py#L2133

To add it we would need following:

  1. Have translation literals for the languages you want to add: (https://github.com/huggingface/lighteval/blob/main/src/lighteval/tasks/multilingual/tasks.py#L2133)
  2. Add the dataset to hub
  3. Replace the TurkishMMLU with your dataset

Do you think you could do that? cc @gaydmi

hynky1999 avatar Feb 21 '25 15:02 hynky1999

@gaydmi Thank you for bringing this up!

@hynky1999 I have a question. Our dataset can be split into subsets in three ways: (a) make each language a subset, (b) make each subject a subset, (c) make each language-subject combination a subset. Which one would you suggest? I could not find any similar examples in the repo.

ceferisbarov avatar Feb 23 '25 08:02 ceferisbarov

@hynky1999 Hi, yes, working on it! @ceferisbarov I personally think option (c) is the best, so we could just add new languages with their tasks. Like in here: https://github.com/huggingface/lighteval/blob/main/src/lighteval/tasks/multilingual/tasks.py#L2617

gaydmi avatar Feb 24 '25 22:02 gaydmi

I would say ideally use subset for languages and then add column to identify the actuall task subset. You can then use hf_filter arg on task

hynky1999 avatar Feb 24 '25 22:02 hynky1999

Both options sound good to me. I have added the dataset to Hugging Face:

https://huggingface.co/datasets/jafarisbarov/TUMLU-mini

@gaydmi let me know if I can help in any other way.

ceferisbarov avatar Feb 25 '25 20:02 ceferisbarov

Awesome, cc @gaydmi happy to review the PR once ready

hynky1999 avatar Feb 26 '25 12:02 hynky1999