instruct-eval Regarding the comparison to lm-evaluation-harness

Regarding the comparison to lm-evaluation-harness

Open gakada opened this issue 2 years ago • 1 comments

For

Compared to existing libraries such as evaluation-harness and HELM, this repo enables simple and convenient evaluation for multiple models. Notably, we support most models from HuggingFace Transformers

isn't

python main.py mmlu --model_name llama --model_path some-llama

roughly the same as

python main.py --model_args pretrained=some-llama,... --tasks hendrycksTest* --num_fewshot 5

in lm-evaluation-harness? Or also python scripts/regression.py --models multiple-models --tasks multiple-tasks. It also supports most HF models and some OpenAI and Anthropic models.

Jun 13 '23 04:06 gakada

The answer to your question is that this "library" is an un-credited fork of lm-eval whose commit history has been scrubbed.

May 25 '25 18:05 StellaAthena

instruct-eval instruct-eval copied to clipboard

Regarding the comparison to lm-evaluation-harness

instruct-eval
instruct-eval copied to clipboard