AgentBench
AgentBench copied to clipboard
Evaluation results is always 0, and different from the Leaderboard
I want to evaluate the vicuna_7b_v1.5 with the webshop task, and according to the configs/agents/fastchat_client.yaml
the agent config is setted as following:
module: "src.agents.FastChatAgent"
parameters:
controller_address: "http://localhost:5000"
max_new_tokens: 128
temperature: 0
top_p: 0
model_name: "vicuna_7b_v1.5"
The vicuna_7b_v1.5 model is deployed with fastchat controller and model_worker. The evaluation command is:
python eval.py --task configs/tasks/webshop/dev.yaml --agent configs/agents/fastchat_client.yaml
And after execution done, I get the following results:
{
"reward": 0.0,
"format_fail_rate": 0.0125,
"average_round": 9.9625
}
The reward is always 0, and different from the leaderboard. So what went wrong? Could anyone give some help?