AgentBench issues

[Bug/Assistance] card game 测评开源大模型运行报错 failed with error INTERACT_FAILED {"detail":"Error: Worker not responding\n"}

**Describe the bug** 每次运行到cg-std#14 assigner报错 > Warning: Qwen2-72B-Instruct/cg-std#14 failed with error INTERACT_FAILED {"detail":"Error: Worker not responding\n"} index=None status= result=None history=None start_task进程中打印 > except sending except message sent 然后任务运行完也会只有error.jsonl结果看了下大概是这个位置开始报错 https://github.com/THUDM/AgentBench/blob/57b982b10f782661b1346b2234c5ed463f6f85c3/src/server/tasks/card_game/server.py#L38...

moon-fall

bug

help wanted

urgent - if there one of the problems throws an error , why does the overall.json not show up??

ishapuri

bug

help wanted

请问trajectories有公开吗

你好，请问对于各个任务，有release相应的trajectories吗？包括human的和LLM的。在文章里貌似没有找到呢。谢谢。

yanan1116

[Assistance] 如何实现demo视频中的效果

我已经能正常运行dbbench的任务，并且在output中正常输出结果。想请教一下如何实现demo视频中实时观察agent在终端操作数据库的效果呢？

XGJ111

bug

help wanted

webshop场景，为什么有些搜索没有结果，导致任务失败

**Is your feature request related to a problem? Please describe.** A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] **Describe the solution you'd...

kai0705

enhancement

[Feature] 关于游戏场景docker的一些疑问，http://nginx.org/r/error_log，相关报错，请问这个是docker没有连接外网导致的吗

![image](https://github.com/user-attachments/assets/1740ecff-5e8b-4814-b910-4ade1466a158)

kai0705

enhancement

DBbench中mysql命令执行的结果，总是没有表

2

{ "index": 298, "error": null, "info": null, "output": { "index": 298, "status": "completed", "result": { "answer": "1049 (42000): Unknown database 'team_stadiums'", "type": "UPDATE", "error": "" }, "history": [ { "role":...

Chucy2020

bug

help wanted

YSLIU627

enhancement

AgentBench
AgentBench copied to clipboard

Metadata

[Bug/Assistance] card game 测评开源大模型运行报错 failed with error INTERACT_FAILED {"detail":"Error: Worker not responding\n"}

urgent - if there one of the problems throws an error , why does the overall.json not show up??

请问trajectories有公开吗

[Assistance] 如何实现demo视频中的效果

webshop场景，为什么有些搜索没有结果，导致任务失败

[Feature] 关于游戏场景docker的一些疑问，http://nginx.org/r/error_log，相关报错，请问这个是docker没有连接外网导致的吗

DBbench中mysql命令执行的结果，总是没有表

测试国内其它模型效果

Feat: Supports multiple API keys and distributes calls evenly in http_agent.

[Feature] 请问可以支持本地下载环境而非使用docker吗

← Metadata

Owner

Metadata

AgentBench AgentBench copied to clipboard

Metadata

← Metadata

Owner

Metadata

AgentBench
AgentBench copied to clipboard