AgentBench
AgentBench copied to clipboard
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
环境:liunx 问题描述: 我和其他issue遇到了同样的问题,在运行几分钟后,最终会在controller中报出错误,Error: Worker not responding 我给启动task的docker命令里加了-it,并把docker中的输出打印了出来,发现会卡在returning处,如下图 辛苦有空检查下这个任务是否能够正常运行 配置: start_task.yaml ``` definition: import: tasks/task_assembly.yaml start: cg-std: 1 ``` default.yaml ``` import: definition.yaml concurrency: task: cg-std: 1 agent: gpt-3.5-turbo-0613: 1 assignments:...
### 这是我的报错  ### 这是我的配置文件 - configs\start_task.yaml  - configs\assignments\default.yaml  - docker log 
Curious here if anyone has done/planned for running agentbench on any mistral models yet? Mixtral 8x7b seems like one of the best open source models at the moment and the...
cg和ltp的std都出现了问题:Error: Worker not responding,且没有一条数据执行成功
{"index": 145, "error": null, "info": null, "output": {"index": 145, "status": "task limit reached", "result": {"predict": [], "actions": []}, "history":
与另外几个帖子类似,我也遇到了Worker not responding的问题: https://github.com/THUDM/AgentBench/issues/63 https://github.com/THUDM/AgentBench/issues/56 https://github.com/THUDM/AgentBench/issues/53 https://github.com/THUDM/AgentBench/issues/87 但是我的情况是只有cg和kg任务会遇到,定位后发现是卡在这个请求语句上: https://github.com/THUDM/AgentBench/blob/main/src/server/task_controller.py#L230 报错内容类似这样: ``` task KnowledgeGraph-std worker 2 error Cannot connect to host localhost:5001 ssl:default [Connect call failed ('127.0.1.1', 5001)] ```
Q:os数据集中os-std-003-ac-00000数据已经重复尝试几次,一直提示Worker not responding,其余143条数据均能正常评估,报错信息如下 `Warning: chatglm2-6b/os-std#std-003-ac-00000 failed with error INTERACT_FAILED {"detail":"Error: Worker not responding\n"} index=None status= result=None history=None` 模型:chatglm2 + fastchat 已尝试将start_task.yaml中的worker数调成1或5,报错不变 start_task.yaml ``` definition: import: tasks/task_assembly.yaml start: os-std: 1 ```...
ltp无法启动
启动时出现如下报错,我的包如下: ''' Package Version ----------------------- ------------ accelerate 0.23.0 aiohttp 3.8.6 aiosignal 1.2.0 anthropic 0.4.1 anyio 3.6.1 asttokens 2.4.1 async-timeout 4.0.2 attrs 21.4.0 beautifulsoup4 4.12.3 certifi 2023.5.7 charset-normalizer 2.1.0 click 8.1.3...
Could u release some running logs, like conversation history to us as a reference? These logs can be used as a reference to check the gap between the scores of...
我想要评测游戏的三个任务,在配置了``start_task.yaml``后,一直显示task不存在,麻烦问一下可以怎么解决 报错: start_task.yaml ```yaml definition: import: tasks/task_assembly.yaml start: # dbbench-std: 5 # os-std: 5 cg-std: 5 alfworld-std: 5 ltp-std: 5 ``` default.yaml ```yaml import: definition.yaml concurrency: task: # dbbench-std: 5...