Dan Saattrup Smart comments

Results 240 comments of


                                            Dan Saattrup Smart

Bias evaluation

Bias benchmark for generation: https://arxiv.org/abs/2503.06987

Bias evaluation

@Rijgersberg Still working on methodology, should be some news within a month's time. But it's quite interesting to note measuring bias on raw generations seems to be different from measuring...

Bias evaluation

@Rijgersberg Can share that I've successfully come up with a reasonable scalable methodology, and used it to generate datasets for Danish and Dutch, which have been reviewed by my peers...

Bias evaluation

Maybe relevant to bias evaluation: https://arctotherium.substack.com/p/llm-exchange-rates-updated

fix: Make `convert_token_to_string` pickleable

@rlouf, does this seem good to you?

[Bug]: Very slow guided decoding with Outlines backend since v0.6.5

> This is probably related to #9032 Should note that that issue happened before vLLM v0.6.5 was even released, so it seems like the two issues _might_ be different.

[MODEL EVALUATION REQUEST] Qwen3-4B-Thinking-2507

Missing evaluations for the following languages: - [ ] Bosnian (bs) - [ ] Croatian (hr) - [ ] Greek (el) - [ ] Latvian (lv) - [ ] Lithuanian...

[MODEL EVALUATION REQUEST] speakleash/Bielik-11B-v2.3-Instruct

Done! Results live on [the leaderboards](https://scandeval.com) now 🎉

[MODEL EVALUATION REQUEST] speakleash/Bielik-11B-v2.3-Instruct

@djstrong Re-opening to add Finnish 🙂

[BENCHMARK DATASET REQUEST] LongBench v2

English benchmarks are fine for the English benchmarks - translated ones can be added to the others as well. Perhaps you wanna try to implement this one yourself? We have...