AgentGym icon indicating copy to clipboard operation
AgentGym copied to clipboard

Sciworld environment task ID does not align with training data

Open Kelatte opened this issue 7 months ago • 3 comments

English: Here's the English translation:

Issue Description:

The sciworld_train.json downloaded from Hugging Face cannot properly match instructions when retrieving by train ID, compared to instructions returned by the same ID using the repository's provided code.

After investigation, I found that task 2-3 "measure-melting-point-unknown-substance" does not appear in the training set. Once I added 2-3 to the exceptions list, the IDs successfully corresponded.

Questions:

  1. Is this a bug in the code?
  2. If this code is used for evaluation, would this cause partial overlap between the training set and test set?

中文: 在huggingface中下载得来的sciworld_train.json,根据train id获取其instruction,无法与使用仓库提供代码,相同id返回的instruction对应。

经过检查后发现,2-3任务“measure-melting-point-unknown-substance”不在训练集中出现。并且将2-3添加至exceptions后,id成功对应。

请问这是否是代码的一个bug?如果使用该代码做评测,这是否会造成训练集与测试集部分重叠?

Kelatte avatar May 29 '25 03:05 Kelatte

If my analysis is correct, using this code directly for evaluation would include 61 samples that are entirely from the training set. Do you meet this problem. Any help is appreciated! @WooooDyy

Kelatte avatar May 29 '25 03:05 Kelatte

@Kelatte 你好 可以问你一个其他task的问题吗 请问你有试过他们的Webarena吗 我安装后显示有异步的issue

YSLIU627 avatar Jun 17 '25 00:06 YSLIU627

@Kelatte 你好 可以问你一个其他task的问题吗 请问你有试过他们的Webarena吗 我安装后显示有异步的issue

抱歉webarena目前我们还没有尝试

Kelatte avatar Jun 17 '25 01:06 Kelatte