instruct-eval icon indicating copy to clipboard operation
instruct-eval copied to clipboard

This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.

Results 24 instruct-eval issues
Sort by recently updated
recently updated
newest added

There are few errors occurring. With [instructionBERT](https://huggingface.co/Bachstelze/instructionBERT): `python main.py drop --model_name seq_to_seq --model_path Bachstelze/instructionBERT` > Traceback (most recent call last): File "main.py", line 98, in Fire(main) File "/home/hilsenbek/.conda/envs/instruct-eval/lib/python3.8/site-packages/fire/core.py", line 141,...

How can we use the scripts in a colab notebook? There are installation problems with or without conda and also after a restart. > WARNING: The following packages were previously...

Why isn't the crass script in the examples? Or is there somewhere a detailed documentation?

Hi, on a single 4090 GPU with 24GB memory, the following command will cause out-of-memory. ```bash python main.py mmlu --model_name llama --model_path huggyllama/llama-7b ``` After that, I try executing the...

Hi there, Will it be possible to submit our own model to the leaderboard?

Hi, I found that the prompt generated from the dataset (ex: MMLU) is not wrapped according to the model's prompt template. The performance you'll get out of the model will...

Accuracy? Exact match? F1-score? I cannot find the description in the paper: ![image](https://github.com/declare-lab/instruct-eval/assets/8592144/72f24a75-ab17-4a07-b6db-62a8a3b74a43)

### Description Merge arguments and kwargs at https://github.com/declare-lab/instruct-eval/blob/1b4f253076ce6c36309da44d82f2d8b67afc886a/modeling.py#L156 to avoid multiple values for keyword arguments ### Related Issue https://github.com/declare-lab/instruct-eval/issues/22.

Great thanks for your work! I try exacy the same setting but I got different results on MMLU and BBH. The alpaca-tuned llama always perform worse than original llama(7B or...

Hi I try to evaluate the accuracy of chavinlo/alpaca-native on MMLU. I find the final accuracy is about 36 and I cannot reproduce the result about 41.6. May I ask...