oliverkinch
oliverkinch
[EurLexSum-la](https://huggingface.co/datasets/dennlinger/eur-lex-sum) only has 959 / 187 / 188 train, val, test samples, respectively, and after filtering by `MIN_NUM_CHARS_IN_ARTICLE` and `MAX_NUM_CHARS_IN_ARTICLE` there is only 9 / 0 / 0 samples left.
I can't seem to find the LAG-MMLU dataset anywhere. In section 3.2.1 (Dataset collection) of their paper, they describe creating three versions: 1. The original English questions (baseline) 2. Machine...
> I can't seem to find the LAG-MMLU dataset anywhere. In section 3.2.1 (Dataset collection) of their paper, they describe creating three versions: > > 1. The original English questions...
> > But I can't find any actual reference to where the translated dataset is available. Do you know where it might be accessible? > > Paper link: https://aclanthology.org/2025.nodalida-1.12.pdf >...
For [COPA-lv](https://github.com/LUMII-AILab/VTI-Data/tree/main/copa) we have 400 / 100 / 500 train, val, test samples, but there are no labels given in the test set. Hence we only have 500 samples. Is...
> There is a [Latvian citizenship test](https://www.pmlp.gov.lv/en/examinations-determined-citizenship-law?utm_source=https%3A%2F%2Flivelatvia.lv%2F). If previous tests are available then that could be used as a knowledge dataset. This does not seem to be the case. I...
I have added the Latvian MMLU now. I can begin working on whichever of your suggestions you consider most relevant to prioritise first. > As for the reading comprehension dataset,...
> hey, was notified of this now. didnt give any language identifier, but its pretty easy to do that based on newsroom Are you familiar with which newsrooms that correspond...
Hi @Linguistcoder I just had a look at your dataset, but I can not open https://github.com/kuhumcst/danish-semantic-reasoning-benchmark/blob/main/similarity/similarity.zip as I don't have the required password. Would you be able to share the...
Also having this problem for Danish audio Edit: I don't find the problem using large-v2 and VAD instead of simply using large-v3