AgentBench icon indicating copy to clipboard operation
AgentBench copied to clipboard

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

Results 67 AgentBench issues
Sort by recently updated
recently updated
newest added

**Describe the bug** cert expired **To Reproduce** go to llmbench.ai **Screenshots or Terminal Copy&Paste**

bug
help wanted

有docker的替代方案吗? podman k8s 没有sudo 权限

enhancement

**Describe the bug** 我看我们的OS环境目前是以用docker的方式把环境进行打包,之后利用Python的Docker.py和os环境交互。 想问一下,我们的OS环境,有相关的操作说明吗? 就类似RL的环境 每一步都有哪些可执行操作、环境会有的反馈以及每一次的reward,我想单独把OS交互的代码拿出来,之后适配到我们自己的任务里……

bug
help wanted

Hello, Do you make model trajectories (and their interactions with the system available)? I couldn't find them. Thank you!

enhancement

**Describe the bug** 在测试ltp环境的时候遇到了INTERACT_FAILED的问题,进度条一直显示为0 **Screenshots or Terminal Copy&Paste** **Desktop (please complete the following information):** - OS: MacOS - Python: 3.9

bug
help wanted

Operation system dataset has wrong start script. 1. dev.json "description": "what is the output if i execute ~/test?" the code should not be run as init script but as start...

bug
help wanted