alpaca_eval issues

Why evaluate_from_model run so slow on my side

3

I am running with 8 A40 GPU and I think it should be fast. I set up the environment and run ``` alpaca_eval evaluate_from_model --model_configs 'robin-v2-7b' --annotators_config 'claude' ``` and...

hanningzhang

minotaur-13b results

4

according to `git status` there was no updated leaderboard file. ``` win_rate standard_error n_total minotaur-13b 67.64 1.64 805 ```

winglian

Include airoboros-33b/65b gpt4 1.2 models

1

Model outputs, and GPT4 eval results here: https://huggingface.co/datasets/jondurbin/airoboros-gpt4-1.2-alpaca-eval/tree/main

jondurbin

Possibility of adding a version signature

3

I want to propose adding a version signature to AlpacaEval a la [sacreBLEU signatures](https://github.com/mjpost/sacrebleu?tab=readme-ov-file#version-signatures) and explicit instructions for reporting scores to improve reproducibility. For those unfamiliar with the sacreBLEU metric,...

mathewhuen

possibility of adding llama3-70b as the evaluator?

GPT-4 is so expensive if we have to run hundreds of experiments for scientific studies. I wonder whether you have tried using Llama3-70b, which performs comparably to the older version...

zhuang-li

Overly High Win Rate for Alpaca v2 on mistral 7b orpo

4

Hey Team, We're running some experiments with mistral 7b orpo and variants, but found that using GPT-4-1106-preview as baseline + openai gpt-4 judgement produce overly high results: ``` INFO:root:Not saving...

qingquansong

Add Aligner 2B+GPT-4 Turbo (04/09) Results

We would like to add [Aligner-2B+GPT-4 Turbo (04/09)](https://github.com/AlignInc/aligner-replication) to AlpacaEval 2.0. It is the reproduction of the paper - [Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction](https://arxiv.org/pdf/2402.02416.pdf) Thank you for such...

AlignInc

Add Phi 3 models

It would be really nice if Microsoft's new [Phi 3 models](https://huggingface.co/collections/microsoft/phi-3-6626e15e9585a200d2d761e3) could be added to the AlpacaEval Leaderboard,

EwoutH

add example for Llama3 vllm server

1

Hello, I am creating this PR to share the example of evaluating by local model using API call (vllm server). I find this approach can be quite useful when: -...

cameron-chen

alpaca_eval
alpaca_eval copied to clipboard

Metadata

Why evaluate_from_model run so slow on my side

minotaur-13b results

Include airoboros-33b/65b gpt4 1.2 models

alpaca farm migration

Possibility of adding a version signature

possibility of adding llama3-70b as the evaluator?

Overly High Win Rate for Alpaca v2 on mistral 7b orpo

Add Aligner 2B+GPT-4 Turbo (04/09) Results

Add Phi 3 models

add example for Llama3 vllm server

← Metadata

Owner

Metadata

alpaca_eval alpaca_eval copied to clipboard

Metadata

← Metadata

Owner

Metadata

alpaca_eval
alpaca_eval copied to clipboard