visualwebarena
visualwebarena copied to clipboard
Fix: Update test_webarena.raw.json for better evaluation.
Reason for Change: Some answers in test_webarena.raw.json are incorrect. I believe minor fixes are needed for more accurate evaluation.
Changes Made: I mainly fixed two types of configuration:
1. I observed that the answers for the intent “Show me the command to clone {{repo}} with SSH.” were inconsistent. Specifically, while some configurations have the answer "exact_match": "git clone ssh://[email protected]:2222/convexegg/super_awesome_robot.git", others use "exact_match": "ssh://[email protected]:2222/eriklindernoren/PyTorch-GAN.git". Therefore, I unified the answers to the first one.
2. I noticed that the answers for the intents “Open my latest updated issue that has the keyword ‘{{keyword}}’ in its title to check if it is closed” and “Open my latest created issue that has {{keyword}} in its title to check if it is closed” were not consistent. The first intent’s answer uses "fuzzy_match": ["Yes, it is closed"], while the second one uses "exact_match": "Yes". Therefore, I unified the answers to the first one.
Testing: I tested these changes locally in a Docker environment and confirmed that no errors occurred as a result of these changes.
Request for Feedback: If there are any concerns or additional improvements you’d like me to make, please let me know.