oliverkinch

Results 12 comments of oliverkinch

[EurLexSum-la](https://huggingface.co/datasets/dennlinger/eur-lex-sum) only has 959 / 187 / 188 train, val, test samples, respectively, and after filtering by `MIN_NUM_CHARS_IN_ARTICLE` and `MAX_NUM_CHARS_IN_ARTICLE` there is only 9 / 0 / 0 samples left.

I can't seem to find the LAG-MMLU dataset anywhere. In section 3.2.1 (Dataset collection) of their paper, they describe creating three versions: 1. The original English questions (baseline) 2. Machine...

> I can't seem to find the LAG-MMLU dataset anywhere. In section 3.2.1 (Dataset collection) of their paper, they describe creating three versions: > > 1. The original English questions...

> > But I can't find any actual reference to where the translated dataset is available. Do you know where it might be accessible? > > Paper link: https://aclanthology.org/2025.nodalida-1.12.pdf >...

For [COPA-lv](https://github.com/LUMII-AILab/VTI-Data/tree/main/copa) we have 400 / 100 / 500 train, val, test samples, but there are no labels given in the test set. Hence we only have 500 samples. Is...

> There is a [Latvian citizenship test](https://www.pmlp.gov.lv/en/examinations-determined-citizenship-law?utm_source=https%3A%2F%2Flivelatvia.lv%2F). If previous tests are available then that could be used as a knowledge dataset. This does not seem to be the case. I...

I have added the Latvian MMLU now. I can begin working on whichever of your suggestions you consider most relevant to prioritise first. > As for the reading comprehension dataset,...

> hey, was notified of this now. didnt give any language identifier, but its pretty easy to do that based on newsroom Are you familiar with which newsrooms that correspond...

Hi @Linguistcoder I just had a look at your dataset, but I can not open https://github.com/kuhumcst/danish-semantic-reasoning-benchmark/blob/main/similarity/similarity.zip as I don't have the required password. Would you be able to share the...

Also having this problem for Danish audio Edit: I don't find the problem using large-v2 and VAD instead of simply using large-v3