SimPO about evaluating Simpo-v0.2 by arena-hard

about evaluating Simpo-v0.2 by arena-hard

Open jimmy19991222 opened this issue 5 months ago • 3 comments

Hi, I tried to eval the Llama-3-Instruct-8B-SimPO-v0.2 checkpoint by arena-hard-auto, and I only got

Llama-3-Instruct-8B-SimPO-v0.2 | score: 35.4 | 95% CI: (-3.2, 2.0) | average #tokens: 530

while your paper reported 36.5

So I am wondering if my vllm api server setting is right:

python3 -m vllm.entrypoints.openai.api_server \
        --model path-to-SimPO-v0.2 \
        --host 0.0.0.0 --port 5001 --served-model-name SimPO-v0.2 \
        --chat-template templates/llama3.jinja

Sep 21 '24 14:09 jimmy19991222

SimPO SimPO copied to clipboard

about evaluating Simpo-v0.2 by arena-hard

SimPO
SimPO copied to clipboard