AgentBench issues

[Bug/Assistance] DBBench Unknown database

1

DBBench user全部回复"1049 (42000): Unknown database 'xxx'" eg.{"index": 299, "error": null, "info": null, "output": {"index": 299, "status": "completed", "result": {"answer": "1049 (42000): Unknown database 'Team Information'", "type": "UPDATE", "error": ""}

LittleWhite0208

bug

help wanted

agentbench 能跑训练集么？

1

![image](https://github.com/THUDM/AgentBench/assets/77482343/ef449feb-b9a6-4ac2-af47-d47c07f177ad) ![image](https://github.com/THUDM/AgentBench/assets/77482343/1255f420-7bc2-41a9-b77e-8d3b0d8fc5f8) ![image](https://github.com/THUDM/AgentBench/assets/77482343/e51508d5-e30f-4f65-9310-cea6b4b097f1) ![image](https://github.com/THUDM/AgentBench/assets/77482343/1d7465bd-19d1-4910-92bc-cd92b7e86f74) 如图，我在原版alfworld train中采样了几个sample作为train集，并配置了相关参数，但出来的结果都是unknown，我可以通过什么方法访问到alfworld的训练集么？（同时webshop的训练集有什么方法可以访问么？比如下图设定中修改什么参数？） ![image](https://github.com/THUDM/AgentBench/assets/77482343/33d4a195-e663-45bb-89af-d48e3c6c0a43)

Fu-Dayuan

bug

help wanted

dbbench-std: Task Output Seems Correct But MD5 Mismatches

1

I looked into one particular DbBench task. GPT4 seems to have give the right answer but MD5 doesn't match. Steps to reproduce the behavior: 1. Run a task with line...

wchen-github

bug

help wanted

cg任务没有一条执行成功而且task server没有收到任何信息

1

您好，我在复现的时候出现和[Issue 64](https://github.com/THUDM/AgentBench/issues/63)相似的问题。我尝试了所有的task，除了cg任务外其他都可以正常运行。其中值得注意的是，ltp任务需要大量的时间才可以运行完一条数据（在我的环境里大约是10min），所以很容易让人觉得ltp任务也不能正常运行。ltp任务的task server后台是一直都有交互信息的，但cg任务的task server后台没有任何交互信息出现。两个任务都会出现`Warning: gpt-3.5-turbo-0613/cg-dev#11 failed with error START_FAILED {"detail":"Error: Worker not responding\n"} None ` 我使用的是chatGPT3.5的API，并将并发量都设成了1。以下是我配置信息和相关的截图： ### default.yaml ``` import: definition.yaml concurrency: task: cg-dev: 1 agent: gpt-3.5-turbo-0613: 1 assignments: #...

Jianzhao-Huang

bug

help wanted

[Bug/Assistance] DBbench任务评测结果与leaderboard不一致

1

运行的是dbbench-std任务，worker数量5。开源模型都来自Huggingface，用fastchat部署 | 使用模型 | 实际分数 | Leaderboard分数 | | - | - | - | | gpt-3.5-turbo-0613 | 37.667 | 15.00 | | llama2-13b-chat | 25.00 | 4.50 | |...

SummerXIATIAN

bug

help wanted

[Bug/Assistance] The option link fails to jump

**Describe the bug** The official website fails to jump when I switch the link options. **To Reproduce** Steps to reproduce the behavior: 1. Go to https://llmbench.ai/safety/data 2. Click on AgentBench...

zhimin-z

bug

help wanted

[Assistance] Number of problems in the OS dataset

2

Hi, I have counted the number of data samples or problems in the 'os_interaction' folder, and my count shows a total of 191 samples. However, the table that provides statistics...

deema-A

bug

help wanted

Evaluation results is always 0, and different from the Leaderboard

4

I want to evaluate the [vicuna_7b_v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5) with the webshop task, and according to the `configs/agents/fastchat_client.yaml` the agent config is setted as following: ``` module: "src.agents.FastChatAgent" parameters: controller_address: "http://localhost:5000" max_new_tokens: 128...

lynneChan

minimum r in “Evaluation Prompt Setup”？

1

We select the minimum r such that count of all tokens in (u0, ar, ur+1, · · · , uk) is not greater than 3500. ``` cn 1. 为什么是3500而不是其他数字？ 2....

DryPilgrim

OS-task catch errors in container init

Added starting the container to the failure modes with an error message if either the init or start scripts fail. Before this change there were the following problems: - if...

rjmoss

AgentBench
AgentBench copied to clipboard

Metadata

[Bug/Assistance] DBBench Unknown database

agentbench 能跑训练集么？

dbbench-std: Task Output Seems Correct But MD5 Mismatches

cg任务没有一条执行成功而且task server没有收到任何信息

[Bug/Assistance] DBbench任务评测结果与leaderboard不一致

[Bug/Assistance] The option link fails to jump

[Assistance] Number of problems in the OS dataset

Evaluation results is always 0, and different from the Leaderboard

minimum r in “Evaluation Prompt Setup”？

OS-task catch errors in container init

← Metadata

Owner

Metadata

AgentBench AgentBench copied to clipboard

Metadata

← Metadata

Owner

Metadata

AgentBench
AgentBench copied to clipboard