SimPO
SimPO copied to clipboard
about evaluating Simpo-v0.2 by arena-hard
Hi, I tried to eval the Llama-3-Instruct-8B-SimPO-v0.2 checkpoint by arena-hard-auto, and I only got
Llama-3-Instruct-8B-SimPO-v0.2 | score: 35.4 | 95% CI: (-3.2, 2.0) | average #tokens: 530
while your paper reported 36.5
So I am wondering if my vllm api server setting is right:
python3 -m vllm.entrypoints.openai.api_server \
--model path-to-SimPO-v0.2 \
--host 0.0.0.0 --port 5001 --served-model-name SimPO-v0.2 \
--chat-template templates/llama3.jinja