Dan Saattrup Nielsen issues

Results 74 issues of


Dan Saattrup Nielsen

[MODEL EVALUATION REQUEST] VAGOsolutions/SauerkrautLM-Gemma-7b

### Model ID VAGOsolutions/SauerkrautLM-Gemma-7b ### Model type Decoder model (e.g., GPT) ### Model languages - [ ] Danish - [ ] Swedish - [ ] Norwegian (Bokmål or Nynorsk) -...

model evaluation request

large model (>7B)

[MODEL EVALUATION REQUEST] HPLT/gpt-33b-nordic-prerelease

### Model ID HPLT/gpt-33b-nordic-prerelease ### Model type Decoder model (e.g., GPT) ### Model languages - [X] Danish - [X] Swedish - [X] Norwegian (Bokmål or Nynorsk) - [ ] Icelandic...

model evaluation request

large model (>7B)

[MODEL EVALUATION REQUEST] 01-ai/Yi-6B-Chat

### Model ID 01-ai/Yi-6B-Chat ### Model type Decoder model (e.g., GPT) ### Model languages - [X] Danish - [X] Swedish - [X] Norwegian (Bokmål or Nynorsk) - [X] Icelandic -...

model evaluation request

small model (<7B)

Support Finnish

Potential Finnish datasets to fit into the existing tasks: - Question answering: [TyDiQA-fi](https://huggingface.co/datasets/tydiqa/viewer/primary_task/train?f[language][value]=%27finnish%27) or [PAN-X-fi](https://huggingface.co/datasets/xtreme/viewer/PAN-X.fi) - Sentiment classification: [ScandiSent-fi](https://huggingface.co/datasets/timpal0l/scandisent/viewer/default/train?f[language][value]=%27fi%27) or [Finnish sentiment](https://huggingface.co/datasets/nisancoskun/finnish_sentiment_data)? Note that these are both binary classification! -...

benchmark dataset request

Support Spanish

Potential Spanish datasets to fit into the existing tasks: - Question answering: Maybe [XQuAD-es](https://huggingface.co/datasets/xquad/viewer/xquad.es) or [MLQA-es](https://huggingface.co/datasets/mlqa/viewer/mlqa.es.es)? - Sentiment classification: [Spanish Targeted Sentiment Headlines](https://huggingface.co/datasets/pysentimiento/spanish-targeted-sentiment-headlines)? - Linguistic acceptability: ScaLA algorithm with [Spanish...

benchmark dataset request

Correlation analysis between MMLU and MMLU-mini

We only include a truncated version of MMLU in ScandEval (MMLU-mini), which consists of a stratified sample of 2,048 samples in the test split. We should conduct an experiment for...

[FEATURE REQUEST] Add finetuning of generative models

### 🚀 The feature, motivation and pitch Adding finetuning evaluation of generative models would give a better idea of how well these models would work when tuned to a particular...

enhancement

[FEATURE REQUEST] Include errors in logs

### 🚀 The feature, motivation and pitch When evaluating models, sometimes the model fails to evaluate the sample correctly. For instance, this might be when we're evaluating a NER task...

enhancement

good first issue

[BUG] Better error message when `hf_transfer` is missing

### 🐛 Describe the bug When evaluating with `HF_HUB_HF_TRANSFER=1` set but `hf_transfer` not installed, an error is thrown, which currently just states that the config can't be loaded. This should...

bug

good first issue

[BUG] Cannot load the `ThatsGroes/Munin-NeuralBeagle-SkoleGPT-instruct` model

### 🐛 Describe the bug When benchmarking the `ThatsGroes/Munin-NeuralBeagle-SkoleGPT-instruct` model, the following error occurs: ``` Traceback (most recent call last): File "/home/ubuntu/.venv/bin/scandeval", line 8, in sys.exit(benchmark()) File "/home/ubuntu/.venv/lib/python3.10/site-packages/click/core.py", line 1157,...

bug