SWE-agent icon indicating copy to clipboard operation
SWE-agent copied to clipboard

Predictions for the following instance_ids were not found in the tasks file and will not be considered: SWE-agent__test-repo-i1

Open Hk669 opened this issue 8 months ago • 8 comments

Describe the bug

when i had to reproduce the logs as mentioned in the Benchmarking , the swe-agent created a patch but when evaluating it. it produce the below error.

(venv) (base) hrushi669@Hrushikesh:/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/SWE-agent/evaluation$ ./run_eval.sh ../trajectories/hrushi669/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1/all_preds.jsonl
Found 1 total predictions, will evaluate 1 (0 are empty)
🏃 Beginning evaluation...
2024-05-31 21:33:56,635 - run_evaluation - WARNING - Predictions for the following instance_ids were not found in the tasks file and will not be considered: SWE-agent__test-repo-i1
2024-05-31 21:33:56,640 - run_evaluation - INFO - Found 1 predictions across 1 model(s) in predictions file
❌ Evaluation failed: 'SWE-agent__test-repo-i1'
Traceback (most recent call last):
  File "/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/SWE-agent/evaluation/evaluation.py", line 72, in main
    run_evaluation(
  File "/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/run_evaluation.py", line 122, in main
  File "/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/run_evaluation.py", line 122, in main   
    t = tasks_map[p[KEY_INSTANCE_ID]]
        ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
KeyError: 'SWE-agent__test-repo-i1'

==================================
Log directory for evaluation run: results/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1
- Wrote per-instance scorecards to ../trajectories/hrushi669/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1/scorecards.json
Reference Report:
- no_generation: 0
- generated: 1
- with_logs: 0
- install_fail: 0
- reset_failed: 0
- no_apply: 0
- applied: 0
- test_errored: 0
- test_timeout: 0
- resolved: 0
- Wrote summary of run to ../trajectories/hrushi669/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1/results.json
image

results.json

{
  "no_generation": [],
  "generated": [
    "SWE-agent__test-repo-i1"
  ],
  "with_logs": [],
  "install_fail": [],
  "reset_failed": [],
  "no_apply": [],
  "applied": [],
  "test_errored": [],
  "test_timeout": [],
  "resolved": []
}

Steps/commands/code to Reproduce

as mentioned in the Benchmarking , but with the model azure:gpt-3.5-turbo-1106

Error message/results

(venv) (base) hrushi669@Hrushikesh:/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/SWE-agent/evaluation$ ./run_eval.sh ../trajectories/hrushi669/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1/all_preds.jsonl
Found 1 total predictions, will evaluate 1 (0 are empty)
🏃 Beginning evaluation...
2024-05-31 21:33:56,635 - run_evaluation - WARNING - Predictions for the following instance_ids were not found in the tasks file and will not be considered: SWE-agent__test-repo-i1
2024-05-31 21:33:56,640 - run_evaluation - INFO - Found 1 predictions across 1 model(s) in predictions file
❌ Evaluation failed: 'SWE-agent__test-repo-i1'
Traceback (most recent call last):
  File "/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/SWE-agent/evaluation/evaluation.py", line 72, in main
    run_evaluation(
  File "/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/run_evaluation.py", line 122, in main
  File "/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/run_evaluation.py", line 122, in main   
    t = tasks_map[p[KEY_INSTANCE_ID]]
        ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
KeyError: 'SWE-agent__test-repo-i1'

==================================
Log directory for evaluation run: results/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1
- Wrote per-instance scorecards to ../trajectories/hrushi669/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1/scorecards.json
Reference Report:
- no_generation: 0
- generated: 1
- with_logs: 0
- install_fail: 0
- reset_failed: 0
- no_apply: 0
- applied: 0
- test_errored: 0
- test_timeout: 0
- resolved: 0
- Wrote summary of run to ../trajectories/hrushi669/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1/results.json

System Information

ubuntu

Checklist

  • [X] I'm running with the latest docker container/on the latest development version
  • [X] I've searched the other issues for a duplicate
  • [X] I have copied the full command/code that I ran (as text, not as screenshot!)
  • [X] If applicable: I have copied the full log file/error message that was the result (as text, not as screenshot!)
  • [X] I have enclosed code/log messages in triple backticks (docs) and clicked "Preview" to make sure it's displayed correctly.

Hk669 avatar May 31 '24 16:05 Hk669