visualwebarena icon indicating copy to clipboard operation
visualwebarena copied to clipboard

Fix: Update test_webarena.raw.json for better evaluation.

Open yeonjooooni opened this issue 4 months ago • 0 comments

Reason for Change: Some answers in test_webarena.raw.json are incorrect. I believe minor fixes are needed for more accurate evaluation.

Changes Made: I mainly fixed two types of configuration:

1.	I observed that the answers for the intent “Show me the command to clone {{repo}} with SSH.” were inconsistent. Specifically, while some configurations have the answer "exact_match": "git clone ssh://[email protected]:2222/convexegg/super_awesome_robot.git", others use "exact_match": "ssh://[email protected]:2222/eriklindernoren/PyTorch-GAN.git". Therefore, I unified the answers to the first one.
2.	I noticed that the answers for the intents “Open my latest updated issue that has the keyword ‘{{keyword}}’ in its title to check if it is closed” and “Open my latest created issue that has {{keyword}} in its title to check if it is closed” were not consistent. The first intent’s answer uses "fuzzy_match": ["Yes, it is closed"], while the second one uses "exact_match": "Yes". Therefore, I unified the answers to the first one.

Testing: I tested these changes locally in a Docker environment and confirmed that no errors occurred as a result of these changes.

Request for Feedback: If there are any concerns or additional improvements you’d like me to make, please let me know.

yeonjooooni avatar Oct 04 '24 10:10 yeonjooooni