AgentBench Evaluation results is always 0, and different from the Leaderboard

Evaluation results is always 0, and different from the Leaderboard

Open lynneChan opened this issue 1 year ago • 4 comments

I want to evaluate the vicuna_7b_v1.5 with the webshop task, and according to the configs/agents/fastchat_client.yaml the agent config is setted as following:

module: "src.agents.FastChatAgent"
parameters:
    controller_address: "http://localhost:5000"
    max_new_tokens: 128
    temperature: 0
    top_p: 0
    model_name: "vicuna_7b_v1.5"

The vicuna_7b_v1.5 model is deployed with fastchat controller and model_worker. The evaluation command is:

python eval.py --task configs/tasks/webshop/dev.yaml --agent configs/agents/fastchat_client.yaml

And after execution done, I get the following results:

{
    "reward": 0.0,
    "format_fail_rate": 0.0125,
    "average_round": 9.9625
}

The reward is always 0, and different from the leaderboard. So what went wrong? Could anyone give some help?

Nov 07 '23 08:11 lynneChan

AgentBench AgentBench copied to clipboard

Evaluation results is always 0, and different from the Leaderboard

AgentBench
AgentBench copied to clipboard