Dan Saattrup Smart comments

Results 240 comments of


                                            Dan Saattrup Smart

[MODEL EVALUATION REQUEST] mistralai/Mistral-Small-3.1-24B-Instruct-2503

Support for vLLM has [been merged in now](https://github.com/vllm-project/vllm/pull/15505), so we're just waiting for the next vLLM release now.

[MODEL EVALUATION REQUEST] mistralai/Mistral-Small-3.1-24B-Instruct-2503

Live on the leaderboards now 🎉

[BENCHMARK DATASET REQUEST] Mideind Icelandic QA

This could be a great dataset to add to the benchmark. Whether it should be the default reading comprehension dataset depends on a few points, however: 1. We currently use...

[BENCHMARK DATASET REQUEST] Mideind Icelandic QA

Sure thing. There's still the matter of my second point above. But since the focus of the dataset is on Icelandic culture, it would probably be a great fit for...

[BENCHMARK DATASET REQUEST] Mideind Icelandic QA

@thorunna Fair enough - I can look into it in a few weeks. But note that if we convert it to a multiple choice QA dataset then we should get...

[BENCHMARK DATASET REQUEST] Danish Similarity Outlier Detection

Looks good! We could formulate it as a multiple-choice task with 6 choices, in which case it fits in with the existing tasks. This would fit in the knowledge category...

[BUG] Generative model finetuning when running evaluation

@usarth Can you try running the evaluation with the `--verbose` and `--raise-errors` flags? It seems like it doesn't recognise that your model is generative properly.

[BUG] Generative model finetuning when running evaluation

Hi @usarth. I don't have a laptop on me until Wednesday, but one thing that might be wrong is if the pipeline tag hasn't been set to "text-generation". Of course,...

[BUG] Generative model finetuning when running evaluation

Hi again @usarth. This should be fixed in the newest version now (v13.1.0). In case it still doesn't work on your end then please re-open this issue.

[BUG] Issue with locally hosted LLM Servers

Hi @c-barakat, and thanks for raising the issue. It is definitely _meant_ to work with custom inference APIs, so this is a bug and not a feature 🙂 I think...