Dan Saattrup Smart

Results 240 comments of Dan Saattrup Smart

As for the reading comprehension dataset, maybe the [FoQA recipe](https://arxiv.org/abs/2502.07642) could be used with Latvian Wikipedia?

Latvian MMLU: https://arxiv.org/abs/2503.11911

@oliverkinch Thanks for checking. Let's leave out summarisation for Latvian then, until we find a better one.

@oliverkinch The [FullStack-NER](https://github.com/LUMII-AILab/FullStack) dataset is probably a better NER dataset, compared to the quite easy WikiANN dataset. But I see that you've added the WikiANN already, which is fine, but...

For summarisation, we could potentially scrape the [LSM.lv](https://www.lsm.lv/) website for news articles and use the first paragraph as the summary, as usual. We shouldn't re-publish that dataset publicly, however, for...

> But I can't find any actual reference to where the translated dataset is available. Do you know where it might be accessible? > > Paper link: https://aclanthology.org/2025.nodalida-1.12.pdf > >...

> Unfortunately, the number of questions in the Latvian exams are very limited: Ah, I see, that's a shame 🙁 Yeah in that case the MMLU-lv is a better bet....

> I can begin working on whichever of your suggestions you consider most relevant to prioritise first. The FoQA recipe thing has already been done, that's MultiWikiQA-lv, which already exists....

@Mikeriess @s-smits This is a really old issue at this point, but are you still encountering this?

@Mikeriess Thanks, live now 🙂 Looking forward to the other languages!