Dan Saattrup Smart
Dan Saattrup Smart
Bias benchmark for generation: https://arxiv.org/abs/2503.06987
@Rijgersberg Still working on methodology, should be some news within a month's time. But it's quite interesting to note measuring bias on raw generations seems to be different from measuring...
@Rijgersberg Can share that I've successfully come up with a reasonable scalable methodology, and used it to generate datasets for Danish and Dutch, which have been reviewed by my peers...
Maybe relevant to bias evaluation: https://arctotherium.substack.com/p/llm-exchange-rates-updated
@rlouf, does this seem good to you?
> This is probably related to #9032 Should note that that issue happened before vLLM v0.6.5 was even released, so it seems like the two issues _might_ be different.
Missing evaluations for the following languages: - [ ] Bosnian (bs) - [ ] Croatian (hr) - [ ] Greek (el) - [ ] Latvian (lv) - [ ] Lithuanian...
Done! Results live on [the leaderboards](https://scandeval.com) now 🎉
@djstrong Re-opening to add Finnish 🙂
English benchmarks are fine for the English benchmarks - translated ones can be added to the others as well. Perhaps you wanna try to implement this one yourself? We have...