instruct-eval issues

Evaluate EncoderDecoderModels

There are few errors occurring. With [instructionBERT](https://huggingface.co/Bachstelze/instructionBERT): `python main.py drop --model_name seq_to_seq --model_path Bachstelze/instructionBERT` > Traceback (most recent call last): File "main.py", line 98, in Fire(main) File "/home/hilsenbek/.conda/envs/instruct-eval/lib/python3.8/site-packages/fire/core.py", line 141,...

Bachstelze

Colab notebook

How can we use the scripts in a colab notebook? There are installation problems with or without conda and also after a restart. > WARNING: The following packages were previously...

Bachstelze

CRASS

Why isn't the crass script in the examples? Or is there somewhere a detailed documentation?

Bachstelze

Evaluate on a single 24GB/32GB GPU

1

Hi, on a single 4090 GPU with 24GB memory, the following command will cause out-of-memory. ```bash python main.py mmlu --model_name llama --model_path huggyllama/llama-7b ``` After that, I try executing the...

lemyx

How to submit own model to leaderboard?

1

Hi there, Will it be possible to submit our own model to the leaderboard?

timothylimyl

[Prompt Template] Silent bug - Performance Killer

Hi, I found that the prompt generated from the dataset (ex: MMLU) is not wrapped according to the model's prompt template. The performance you'll get out of the model will...

timothylimyl

What are the metrics for the evaluation results?

Accuracy? Exact match? F1-score? I cannot find the description in the paper: ![image](https://github.com/declare-lab/instruct-eval/assets/8592144/72f24a75-ab17-4a07-b6db-62a8a3b74a43)

zhimin-z

modify gitignore and fix the bug when run humaneval

### Description Merge arguments and kwargs at https://github.com/declare-lab/instruct-eval/blob/1b4f253076ce6c36309da44d82f2d8b67afc886a/modeling.py#L156 to avoid multiple values for keyword arguments ### Related Issue https://github.com/declare-lab/instruct-eval/issues/22.

yjw1029

Can not reproduce results on the table

7

Great thanks for your work! I try exacy the same setting but I got different results on MMLU and BBH. The alpaca-tuned llama always perform worse than original llama(7B or...

simplelifetime

Reproduce the accuracy of chavinlo/alpaca-native on MMLU

Hi I try to evaluate the accuracy of chavinlo/alpaca-native on MMLU. I find the final accuracy is about 36 and I cannot reproduce the result about 41.6. May I ask...

sglucas

instruct-eval
instruct-eval copied to clipboard

Metadata

Evaluate EncoderDecoderModels

Colab notebook

CRASS

Evaluate on a single 24GB/32GB GPU

How to submit own model to leaderboard?

[Prompt Template] Silent bug - Performance Killer

What are the metrics for the evaluation results?

modify gitignore and fix the bug when run humaneval

Can not reproduce results on the table

Reproduce the accuracy of chavinlo/alpaca-native on MMLU

← Metadata

Owner

Metadata

instruct-eval instruct-eval copied to clipboard

Metadata

← Metadata

Owner

Metadata

instruct-eval
instruct-eval copied to clipboard