AgentBench icon indicating copy to clipboard operation
AgentBench copied to clipboard

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

Results 67 AgentBench issues
Sort by recently updated
recently updated
newest added

**Title**: Revise Prompts to Comply with OpenAI API Policy **Description**: ### Background Recent updates to the OpenAI API have introduced stricter content filtering policies, causing some of our existing prompts...

截止2025年2月,目前涌现大量能力更强的LLM。

enhancement

Hi AgentBench Team, Thanks for your awesome effort in constructing this benchmark. I would like to ask have you or plan to add the experimental results of large reasoning models...

enhancement

I am trying to run the webshop-std but it shows that the task does not exist. May I ask why it will happen? ![error](https://github.com/user-attachments/assets/648631af-541c-4c3a-a8b1-d461bd86191b) ![error 2](https://github.com/user-attachments/assets/6972df6b-10d0-4602-8b9d-1b538259c976) Following is my config:...

bug
help wanted

In data/os_interaction/data/dev.json, the example code for task "Find out count of linux users on this system who belong to at least 4 groups." is incorrect. The current example checks for...

bug
help wanted

**Describe the bug** A clear and concise description of what the bug is. In the code, Following code is used to check whether the input string is an entity: ```python...

bug
help wanted

I want to view the UI like the demo video. Does anyone know how i can do this?

bug
help wanted

Does anyone run into 100 error on the docker build? ``` docker build -f data/os_interaction/res/dockerfiles/default data/os_interaction/res/dockerfiles --tag local-os/default ``` ``` 1.987 At least one invalid signature was encountered. 2.082 Get:3...

bug
help wanted