instruct-eval icon indicating copy to clipboard operation
instruct-eval copied to clipboard

This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.

Results 24 instruct-eval issues
Sort by recently updated
recently updated
newest added

Would you support chinese evaluation dataset C-Eval?It will be a important work for chinese LLM Evaluation.

[baichuan-inc/baichuan-7B](https://huggingface.co/baichuan-inc/baichuan-7B)

Good job! Could you please add multiple gpu support? Then we can test more larger models, such as llama 65b

The current version of the code base only returns the final evaluation metric back to the user. However, it is not possible to see what exactly are the model's predictions....

Thanks for this neat repo, very convenient to evaluate LLM! As a feature request, I would like to suggest adding an option to save results of an evaluation for the...

For > Compared to existing libraries such as [evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and [HELM](https://github.com/stanford-crfm/helm), this repo enables simple and convenient evaluation for multiple models. Notably, we support most models from HuggingFace Transformers isn't...

Hi, really thank you for this clear code. I wonder whether you plan to integrate this code into the Transformers trainer. in this way, we can run this code during...

In newer versions of the transformer library, AutoModelForCausalLM can properly identify llama models. There's therefore no need anymore for the LlamaModel class. Llama models run with --model_name causal. The only...