SWE-agent
SWE-agent copied to clipboard
Predictions for the following instance_ids were not found in the tasks file and will not be considered: SWE-agent__test-repo-i1
Describe the bug
when i had to reproduce the logs as mentioned in the Benchmarking , the swe-agent created a patch but when evaluating it. it produce the below error.
(venv) (base) hrushi669@Hrushikesh:/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/SWE-agent/evaluation$ ./run_eval.sh ../trajectories/hrushi669/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1/all_preds.jsonl
Found 1 total predictions, will evaluate 1 (0 are empty)
🏃 Beginning evaluation...
2024-05-31 21:33:56,635 - run_evaluation - WARNING - Predictions for the following instance_ids were not found in the tasks file and will not be considered: SWE-agent__test-repo-i1
2024-05-31 21:33:56,640 - run_evaluation - INFO - Found 1 predictions across 1 model(s) in predictions file
❌ Evaluation failed: 'SWE-agent__test-repo-i1'
Traceback (most recent call last):
File "/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/SWE-agent/evaluation/evaluation.py", line 72, in main
run_evaluation(
File "/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/run_evaluation.py", line 122, in main
File "/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/run_evaluation.py", line 122, in main
t = tasks_map[p[KEY_INSTANCE_ID]]
~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
KeyError: 'SWE-agent__test-repo-i1'
==================================
Log directory for evaluation run: results/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1
- Wrote per-instance scorecards to ../trajectories/hrushi669/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1/scorecards.json
Reference Report:
- no_generation: 0
- generated: 1
- with_logs: 0
- install_fail: 0
- reset_failed: 0
- no_apply: 0
- applied: 0
- test_errored: 0
- test_timeout: 0
- resolved: 0
- Wrote summary of run to ../trajectories/hrushi669/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1/results.json
results.json
{
"no_generation": [],
"generated": [
"SWE-agent__test-repo-i1"
],
"with_logs": [],
"install_fail": [],
"reset_failed": [],
"no_apply": [],
"applied": [],
"test_errored": [],
"test_timeout": [],
"resolved": []
}
Steps/commands/code to Reproduce
as mentioned in the Benchmarking , but with the model azure:gpt-3.5-turbo-1106
Error message/results
(venv) (base) hrushi669@Hrushikesh:/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/SWE-agent/evaluation$ ./run_eval.sh ../trajectories/hrushi669/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1/all_preds.jsonl
Found 1 total predictions, will evaluate 1 (0 are empty)
🏃 Beginning evaluation...
2024-05-31 21:33:56,635 - run_evaluation - WARNING - Predictions for the following instance_ids were not found in the tasks file and will not be considered: SWE-agent__test-repo-i1
2024-05-31 21:33:56,640 - run_evaluation - INFO - Found 1 predictions across 1 model(s) in predictions file
❌ Evaluation failed: 'SWE-agent__test-repo-i1'
Traceback (most recent call last):
File "/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/SWE-agent/evaluation/evaluation.py", line 72, in main
run_evaluation(
File "/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/run_evaluation.py", line 122, in main
File "/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/run_evaluation.py", line 122, in main
t = tasks_map[p[KEY_INSTANCE_ID]]
~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
KeyError: 'SWE-agent__test-repo-i1'
==================================
Log directory for evaluation run: results/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1
- Wrote per-instance scorecards to ../trajectories/hrushi669/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1/scorecards.json
Reference Report:
- no_generation: 0
- generated: 1
- with_logs: 0
- install_fail: 0
- reset_failed: 0
- no_apply: 0
- applied: 0
- test_errored: 0
- test_timeout: 0
- resolved: 0
- Wrote summary of run to ../trajectories/hrushi669/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1/results.json
System Information
ubuntu
Checklist
- [X] I'm running with the latest docker container/on the latest development version
- [X] I've searched the other issues for a duplicate
- [X] I have copied the full command/code that I ran (as text, not as screenshot!)
- [X] If applicable: I have copied the full log file/error message that was the result (as text, not as screenshot!)
- [X] I have enclosed code/log messages in triple backticks (docs) and clicked "Preview" to make sure it's displayed correctly.