mteb Some datasets for languages.

I'm gonna practice drums for the rest of the day and probably won't work tomorrow, but for those who are looking to contribute and get some of those juicy points here is some low-hanging fruit in diverse languages:

Slovak:

~~Sentiment: https://huggingface.co/datasets/sepidmnorozy/Slovak_sentiment~~ (as a matter of fact she has loads of Sentiment classification datasets: https://huggingface.co/sepidmnorozy)
News Summarization: https://huggingface.co/datasets/kiviki/SlovakSum

Greek:

~~Legal code clustering: https://huggingface.co/datasets/AI-team-UoA/greek_legal_code~~
NLI: https://huggingface.co/datasets/Harsit/xnli2.0_greek
Medical QA: https://huggingface.co/datasets/ilsp/medical_mcqa_greek

Maltese:

News titles: https://huggingface.co/datasets/MLRS/maltese_news_headlines
News categories: https://huggingface.co/datasets/MLRS/maltese_news_categories

Apr 18 '24 10:04 x-tabdeveloping

I'm gonna pick up kiviki/SlovakSum if noone is on it yet.

Apr 22 '24 16:04 dokato

On the other hand it seems like the summary task requires:

        human_summaries: list[str]
        machine_summaries: list[str]
        relevance: list[float] (the score of the machine generated summaries)

and kiviki/SlovakSum doesn't have neither machine_summaries nor relevance scores.

Apr 24 '24 09:04 dokato

@dokato Try formulating it as a retrieval task instead :))

Apr 24 '24 11:04 x-tabdeveloping

I can start working on the Maltese datasets if no one is

May 08 '24 11:05 wissam-sib

@wissam-sib Please verify that no one has added them yet or is working on a PR, otherwise feel free to go ahead :D

May 08 '24 12:05 x-tabdeveloping

News categories is being added so I'm gonna go for the NLI one

May 08 '24 12:05 wissam-sib

I will take care of Greek medical QA: https://huggingface.co/datasets/ilsp/medical_mcqa_greek

May 20 '24 16:05 mariyahendriksen

Will close this issue for now - I assume many of these are still relevant to add if so we should probably create separate PRs for these.

@mariyahendriksen do you still want to add the greek medical QA?

Sep 09 '24 15:09 KennethEnevoldsen