AgentBench icon indicating copy to clipboard operation
AgentBench copied to clipboard

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

Results 46 AgentBench issues
Sort by recently updated
recently updated
newest added

If the agent puts out a command like 'while true; do ls /root; sleep 1; done' it will loop while also putting out an output (meaning the socket doesn't timeout)...

Currently the output parsing from the terminal breaks when it first sees a escape symbol however it appends the whole package received from the socket, which does not necessarily correspond...

**Describe the bug** A clear and concise description of what the bug is. **To Reproduce** Steps to reproduce the behavior: 1. Go to '...' 2. Click on '....' 3. Scroll...

bug
help wanted

您好,我在DBbench-std遇到问题无法连接MySQL,其它task运行正常。 我已经按照要求安装相关依赖并docker pull mysql一切正常, 只在运行python -m src.start_task -a会报错,dbbench-std 和 os-std可以执行但是dbbench-std结果不正常。(都是{"role": "agent", "content": "Action: Answer\nFinal Answer: []"}) 感谢您。 报错详情: INFO: Started server process [738313] INFO: Waiting for application startup. INFO: Application...

bug
help wanted

Hi there, Thank you for the great contributions! There have been many new models released since the benchmark was published. Do you have any plans to include some of these...

This issue is related to a [previous one](https://github.com/THUDM/AgentBench/issues/29). For the Knowledge Graph task, the agent seems to be providing the correct answer but the feedback insists that the answer is...

bug
help wanted

**Describe the bug** Could you please upload the dockerfile? That would mean a lot! **To Reproduce** None **Screenshots or Terminal Copy&Paste** None **Desktop (please complete the following information):** None **Additional...

bug
help wanted

**Describe the bug** A large number of the os-std tasks in the 7/bootstrap.json are impossible for the agents to do as the refer to a "given folder" which is at...

bug
help wanted

您好,我使用fastchat进行加载chatglm3-6b模型, step1 `python3 -m fastchat.serve.controller` step2 `python3 -m fastchat.serve.model_worker --model-path /ldata/llms/chatglm3-6b` step3 `python3 -m fastchat.serve.openai_api_server --host 10.0.1.227 --port 30008` 启动服务后,我修改了fs_agents.yaml文件,内容为 ``` default: module: "src.client.agents.FastChatAgent" parameters: name: "FastChat" controller_address: "http://10.0.1.227:30008" max_new_tokens:...

bug
help wanted

the reference answer doesn't following the description