oliverkinch comments

Results 12 comments of


                                            oliverkinch

Support Latvian

[EurLexSum-la](https://huggingface.co/datasets/dennlinger/eur-lex-sum) only has 959 / 187 / 188 train, val, test samples, respectively, and after filtering by `MIN_NUM_CHARS_IN_ARTICLE` and `MAX_NUM_CHARS_IN_ARTICLE` there is only 9 / 0 / 0 samples left.

Support Latvian

I can't seem to find the LAG-MMLU dataset anywhere. In section 3.2.1 (Dataset collection) of their paper, they describe creating three versions: 1. The original English questions (baseline) 2. Machine...

Support Latvian

> I can't seem to find the LAG-MMLU dataset anywhere. In section 3.2.1 (Dataset collection) of their paper, they describe creating three versions: > > 1. The original English questions...

Support Latvian

> > But I can't find any actual reference to where the translated dataset is available. Do you know where it might be accessible? > > Paper link: https://aclanthology.org/2025.nodalida-1.12.pdf >...

Support Latvian

For [COPA-lv](https://github.com/LUMII-AILab/VTI-Data/tree/main/copa) we have 400 / 100 / 500 train, val, test samples, but there are no labels given in the test set. Hence we only have 500 samples. Is...

Support Latvian

> There is a [Latvian citizenship test](https://www.pmlp.gov.lv/en/examinations-determined-citizenship-law?utm_source=https%3A%2F%2Flivelatvia.lv%2F). If previous tests are available then that could be used as a knowledge dataset. This does not seem to be the case. I...

oliverkinch

Support Latvian

Support Latvian

Support Latvian

Support Latvian

Support Latvian

Support Latvian

Support Latvian

[BENCHMARK DATASET REQUEST] Schibsted Summaries

[BENCHMARK DATASET REQUEST] Danish Similarity Outlier Detection

Why it keep saying same words?