AgentGym
AgentGym copied to clipboard
Inconsistent number of instructions for sciworld_test.json on HF dataset
Dear authors,
Thanks for your great work!
I'm trying to reproduce the evaluation results as shown in the paper. However, I just noticed a difference in the number of instructions between the paper and the code.
Table 2 of the paper says there are 200 evaluation instructions for the Sciworld environment, but there are 1042 samples in the sciworld_test.json on AgentEval HF dataset. Also, the conversation contents should be [], rather than all the trajectories.
Could you please update the sciworld_test.json
file on HF datasets to the correct version, which should contain 200 samples and is without any conversation content?
Thanks in advance.