Dan Saattrup Smart

Results 240 comments of Dan Saattrup Smart

Bias benchmark for generation: https://arxiv.org/abs/2503.06987

@Rijgersberg Still working on methodology, should be some news within a month's time. But it's quite interesting to note measuring bias on raw generations seems to be different from measuring...

@Rijgersberg Can share that I've successfully come up with a reasonable scalable methodology, and used it to generate datasets for Danish and Dutch, which have been reviewed by my peers...

Maybe relevant to bias evaluation: https://arctotherium.substack.com/p/llm-exchange-rates-updated

@rlouf, does this seem good to you?

> This is probably related to #9032 Should note that that issue happened before vLLM v0.6.5 was even released, so it seems like the two issues _might_ be different.

Missing evaluations for the following languages: - [ ] Bosnian (bs) - [ ] Croatian (hr) - [ ] Greek (el) - [ ] Latvian (lv) - [ ] Lithuanian...

Done! Results live on [the leaderboards](https://scandeval.com) now 🎉

@djstrong Re-opening to add Finnish 🙂

English benchmarks are fine for the English benchmarks - translated ones can be added to the others as well. Perhaps you wanna try to implement this one yourself? We have...