SWE-agent
SWE-agent copied to clipboard
Predictions for the following instance_ids were not found in the tasks file and will not be considered: SWE-agent__test-repo-i1
Describe the bug
when i had to reproduce the logs as mentioned in the Benchmarking , the swe-agent created a patch but when evaluating it. it produce the below error.
(venv) (base) hrushi669@Hrushikesh:/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/SWE-agent/evaluation$ ./run_eval.sh ../trajectories/hrushi669/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1/all_preds.jsonl
Found 1 total predictions, will evaluate 1 (0 are empty)
π Beginning evaluation...
2024-05-31 21:33:56,635 - run_evaluation - WARNING - Predictions for the following instance_ids were not found in the tasks file and will not be considered: SWE-agent__test-repo-i1
2024-05-31 21:33:56,640 - run_evaluation - INFO - Found 1 predictions across 1 model(s) in predictions file
β Evaluation failed: 'SWE-agent__test-repo-i1'
Traceback (most recent call last):
File "/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/SWE-agent/evaluation/evaluation.py", line 72, in main
run_evaluation(
File "/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/run_evaluation.py", line 122, in main
File "/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/run_evaluation.py", line 122, in main
t = tasks_map[p[KEY_INSTANCE_ID]]
~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
KeyError: 'SWE-agent__test-repo-i1'
==================================
Log directory for evaluation run: results/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1
- Wrote per-instance scorecards to ../trajectories/hrushi669/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1/scorecards.json
Reference Report:
- no_generation: 0
- generated: 1
- with_logs: 0
- install_fail: 0
- reset_failed: 0
- no_apply: 0
- applied: 0
- test_errored: 0
- test_timeout: 0
- resolved: 0
- Wrote summary of run to ../trajectories/hrushi669/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1/results.json
results.json
{
"no_generation": [],
"generated": [
"SWE-agent__test-repo-i1"
],
"with_logs": [],
"install_fail": [],
"reset_failed": [],
"no_apply": [],
"applied": [],
"test_errored": [],
"test_timeout": [],
"resolved": []
}
Steps/commands/code to Reproduce
as mentioned in the Benchmarking , but with the model azure:gpt-3.5-turbo-1106
Error message/results
(venv) (base) hrushi669@Hrushikesh:/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/SWE-agent/evaluation$ ./run_eval.sh ../trajectories/hrushi669/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1/all_preds.jsonl
Found 1 total predictions, will evaluate 1 (0 are empty)
π Beginning evaluation...
2024-05-31 21:33:56,635 - run_evaluation - WARNING - Predictions for the following instance_ids were not found in the tasks file and will not be considered: SWE-agent__test-repo-i1
2024-05-31 21:33:56,640 - run_evaluation - INFO - Found 1 predictions across 1 model(s) in predictions file
β Evaluation failed: 'SWE-agent__test-repo-i1'
Traceback (most recent call last):
File "/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/SWE-agent/evaluation/evaluation.py", line 72, in main
run_evaluation(
File "/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/run_evaluation.py", line 122, in main
File "/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/run_evaluation.py", line 122, in main
t = tasks_map[p[KEY_INSTANCE_ID]]
~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
KeyError: 'SWE-agent__test-repo-i1'
==================================
Log directory for evaluation run: results/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1
- Wrote per-instance scorecards to ../trajectories/hrushi669/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1/scorecards.json
Reference Report:
- no_generation: 0
- generated: 1
- with_logs: 0
- install_fail: 0
- reset_failed: 0
- no_apply: 0
- applied: 0
- test_errored: 0
- test_timeout: 0
- resolved: 0
- Wrote summary of run to ../trajectories/hrushi669/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1/results.json
System Information
ubuntu
Checklist
- [X] I'm running with the latest docker container/on the latest development version
- [X] I've searched the other issues for a duplicate
- [X] I have copied the full command/code that I ran (as text, not as screenshot!)
- [X] If applicable: I have copied the full log file/error message that was the result (as text, not as screenshot!)
- [X] I have enclosed code/log messages in triple backticks (docs) and clicked "Preview" to make sure it's displayed correctly.
Thanks for the report and the already verbose information. Could you also paste the command you ran for evaluation?
(my suspicion is that you might be specifying the wrong input file, just because I know that this happened to me before...)
Thanks for the report and the already verbose information. Could you also paste the command you ran for evaluation?
the evaluation command:
./run_eval.sh ../trajectories/hrushi669/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1/all_preds.jsonl
(my suspicion is that you might be specifying the wrong input file, just because I know that this happened to me before...)
i gave it the correct path of the input file (all_preds.jsonl) as mentioned in the above evaluation command. can you please help me out on this.
and also an issue, even after miniconda is installed in my pc.
(venv) (base) hrushi669@Hrushikesh:/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation$ ./run_eval.sh ../trajectories/hrushi66/azure-gpt-3.5-turbo-1106__SWE-bench_Lite__default__t-0.00__p-0.95__c-3.00__install-1/all_preds.jsonl
Found 8 total predictions, will evaluate 3 (5 are empty)
π Beginning evaluation...
2024-06-01 17:03:59,954 - run_evaluation - INFO - Found 3 predictions across 1 model(s) in predictions file
2024-06-01 17:03:59,963 - run_evaluation - INFO - [azure-gpt-3.5-turbo-1106__SWE-bench_Lite__default__t-0.00__p-0.95__c-3.00__install-1/django__django/4.1] # of predictions to evaluate: 1 (0 already evaluated)
2024-06-01 17:03:59,978 - run_evaluation - INFO - [azure-gpt-3.5-turbo-1106__SWE-bench_Lite__default__t-0.00__p-0.95__c-3.00__install-1/django__django/3.0] # of predictions to evaluate: 1 (0 already evaluated)
2024-06-01 17:03:59,994 - run_evaluation - INFO - [azure-gpt-3.5-turbo-1106__SWE-bench_Lite__default__t-0.00__p-0.95__c-3.00__install-1/sphinx-doc__sphinx/3.5] # of predictions to evaluate: 1 (0 already evaluated)
2024-06-01 17:04:00,075 - testbed - INFO - Created log file /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/results/azure-gpt-3.5-turbo-1106__SWE-bench_Lite__default__t-0.00__p-0.95__c-3.00__install-1/testbed_django_3.0.log
2024-06-01 17:04:00,075 - testbed - INFO - Created log file /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/results/azure-gpt-3.5-turbo-1106__SWE-bench_Lite__default__t-0.00__p-0.95__c-3.00__install-1/testbed_django_4.1.log
2024-06-01 17:04:00,076 - testbed - INFO - Created log file /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/results/azure-gpt-3.5-turbo-1106__SWE-bench_Lite__default__t-0.00__p-0.95__c-3.00__install-1/testbed_sphinx_3.5.log
2024-06-01 17:04:00,079 - testbed - INFO - Repo django/django: 1 versions
2024-06-01 17:04:00,079 - testbed - INFO - Repo django/django: 1 versions
2024-06-01 17:04:00,080 - testbed - INFO - Repo sphinx-doc/sphinx: 1 versions
2024-06-01 17:04:00,082 - testbed - INFO - Version 4.1: 1 instances
2024-06-01 17:04:00,082 - testbed - INFO - Version 3.0: 1 instances
2024-06-01 17:04:00,084 - testbed - INFO - Version 3.5: 1 instances
2024-06-01 17:04:00,092 - testbed - INFO - Using conda path /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7
2024-06-01 17:04:00,094 - testbed - INFO - Using conda path /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/sphinx/3.5/tmp0ges26wf
2024-06-01 17:04:00,095 - testbed - INFO - Using conda path /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/4.1/tmpu2up60g0
2024-06-01 17:04:00,106 - testbed - INFO - Using working directory /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/4.1/tmpc4yi_9nb for testbed
2024-06-01 17:04:00,107 - testbed - INFO - Using working directory /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/sphinx/3.5/tmp0eb_sfio for testbed
2024-06-01 17:04:00,108 - testbed - INFO - Using working directory /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmpx0a2ox00 for testbed
2024-06-01 17:04:00,118 - testbed - INFO - No conda path provided, creating temporary install in /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3...
2024-06-01 17:04:00,119 - testbed - INFO - No conda path provided, creating temporary install in /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/sphinx/3.5/tmp0ges26wf/miniconda3...
2024-06-01 17:04:00,119 - testbed - INFO - No conda path provided, creating temporary install in /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/4.1/tmpu2up60g0/miniconda3...
2024-06-01 17:04:00,124 - testbed - INFO - django/3.0 instances in a single process
2024-06-01 17:04:00,125 - testbed - INFO - sphinx/3.5 instances in a single process
2024-06-01 17:04:00,125 - testbed - INFO - django/4.1 instances in a single process
2024-06-01 17:04:00,128 - testbed - INFO - django/3.0 using Miniconda link: https://repo.anaconda.com/miniconda/Miniconda3-py39_23.10.0-1
2024-06-01 17:04:00,128 - testbed - INFO - sphinx/3.5 using Miniconda link: https://repo.anaconda.com/miniconda/Miniconda3-py39_23.10.0-1
2024-06-01 17:04:00,129 - testbed - INFO - django/4.1 using Miniconda link: https://repo.anaconda.com/miniconda/Miniconda3-py39_23.10.0-1
2024-06-01 17:10:29,623 - testbed - ERROR - Error: Command '['bash', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/4.1/tmpu2up60g0/miniconda3/miniconda.sh', '-b', '-u', '-p', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/4.1/tmpu2up60g0/miniconda3', '&&', 'conda', 'init', '--all']' returned non-zero exit status 1.
2024-06-01 17:10:29,627 - testbed - ERROR - Error: Command '['bash', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/sphinx/3.5/tmp0ges26wf/miniconda3/miniconda.sh', '-b', '-u', '-p', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/sphinx/3.5/tmp0ges26wf/miniconda3', '&&', 'conda', 'init', '--all']' returned non-zero exit status 1.
2024-06-01 17:10:29,634 - testbed - ERROR - Error stdout: PREFIX=/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/sphinx/3.5/tmp0ges26wf/miniconda3
Unpacking payload ...
0%| | 0/69 [00:00<?, ?it/s]
Extracting : archspec-0.2.1-pyhd3eb1b0_0.conda: 0%| | 0/69 [01:01<?, ?it/s]
Extracting : archspec-0.2.1-pyhd3eb1b0_0.conda: 1%|β | 1/69 [01:01<1:09:58, 61.74s/it]
Extracting : boltons-23.0.0-py39h06a4308_0.conda: 1%|β | 1/69 [02:13<1:09:58, 61.74s/it]
Extracting : boltons-23.0.0-py39h06a4308_0.conda: 3%|β | 2/69 [02:13<1:15:25, 67.54s/it]
Extracting : brotli-python-1.0.9-py39h6a678d5_7.conda: 3%|β | 2/69 [02:13<1:15:25, 67.54s/it]
Extracting : bzip2-1.0.8-h7b6447c_0.conda: 4%|β | 3/69 [02:13<1:14:17, 67.54s/it]
Extracting : c-ares-1.19.1-h5eee18b_0.conda: 6%|β | 4/69 [02:13<1:13:10, 67.54s/it]
Extracting : ca-certificates-2023.08.22-h06a4308_0.conda: 7%|β | 5/69 [02:13<1:12:02, 67.54s/it]
Extracting : certifi-2023.7.22-py39h06a4308_0.conda: 9%|β | 6/69 [02:13<1:10:55, 67.54s/it]
Extracting : cffi-1.15.1-py39h5eee18b_3.conda: 10%|β | 7/69 [02:13<1:09:47, 67.54s/it]
Extracting : charset-normalizer-2.0.4-pyhd3eb1b0_0.conda: 12%|ββ | 8/69 [02:13<1:08:39, 67.54s/it]
concurrent.futures.process._RemoteTraceback:
'''
Traceback (most recent call last):
File "concurrent/futures/process.py", line 387, in wait_result_broken_or_wakeup
File "multiprocessing/connection.py", line 256, in recv
TypeError: __init__() missing 1 required positional argument: 'msg'
'''
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "entry_point.py", line 69, in <module>
File "concurrent/futures/process.py", line 562, in _chain_from_iterable_of_lists
File "concurrent/futures/_base.py", line 609, in result_iterator
File "concurrent/futures/_base.py", line 446, in result
File "concurrent/futures/_base.py", line 391, in __get_result
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
[21327] Failed to execute script 'entry_point' due to unhandled exception!
2024-06-01 17:10:29,639 - testbed - ERROR - Error stdout: PREFIX=/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/4.1/tmpu2up60g0/miniconda3
Unpacking payload ...
0%| | 0/69 [00:00<?, ?it/s]
Extracting : archspec-0.2.1-pyhd3eb1b0_0.conda: 0%| | 0/69 [01:09<?, ?it/s]
Extracting : archspec-0.2.1-pyhd3eb1b0_0.conda: 1%|β | 1/69 [01:09<1:18:49, 69.56s/it]
Extracting : boltons-23.0.0-py39h06a4308_0.conda: 1%|β | 1/69 [02:37<1:18:49, 69.56s/it]
Extracting : boltons-23.0.0-py39h06a4308_0.conda: 3%|β | 2/69 [02:37<1:29:34, 80.21s/it]
Extracting : brotli-python-1.0.9-py39h6a678d5_7.conda: 3%|β | 2/69 [02:37<1:29:34, 80.21s/it]
Extracting : bzip2-1.0.8-h7b6447c_0.conda: 4%|β | 3/69 [02:37<1:28:13, 80.21s/it]
Extracting : c-ares-1.19.1-h5eee18b_0.conda: 6%|β | 4/69 [02:37<1:26:53, 80.21s/it]
Extracting : ca-certificates-2023.08.22-h06a4308_0.conda: 7%|β | 5/69 [02:37<1:25:33, 80.21s/it]
Extracting : certifi-2023.7.22-py39h06a4308_0.conda: 9%|β | 6/69 [02:37<1:24:13, 80.21s/it]
Extracting : cffi-1.15.1-py39h5eee18b_3.conda: 10%|β | 7/69 [02:37<1:22:53, 80.21s/it]
Extracting : charset-normalizer-2.0.4-pyhd3eb1b0_0.conda: 12%|ββ | 8/69 [02:37<1:21:32, 80.21s/it]
concurrent.futures.process._RemoteTraceback:
'''
Traceback (most recent call last):
File "concurrent/futures/process.py", line 387, in wait_result_broken_or_wakeup
File "multiprocessing/connection.py", line 256, in recv
TypeError: __init__() missing 1 required positional argument: 'msg'
'''
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "entry_point.py", line 69, in <module>
File "concurrent/futures/process.py", line 562, in _chain_from_iterable_of_lists
File "concurrent/futures/_base.py", line 609, in result_iterator
File "concurrent/futures/_base.py", line 446, in result
File "concurrent/futures/_base.py", line 391, in __get_result
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
[21337] Failed to execute script 'entry_point' due to unhandled exception!
2024-06-01 17:10:29,659 - testbed - ERROR - Error: Command '['bash', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3/miniconda.sh', '-b', '-u', '-p', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3', '&&', 'conda', 'init', '--all']' returned non-zero exit status 1.
2024-06-01 17:10:29,669 - testbed - ERROR - Error stdout: PREFIX=/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3
Unpacking payload ...
0%| | 0/69 [00:00<?, ?it/s]
Extracting : archspec-0.2.1-pyhd3eb1b0_0.conda: 0%| | 0/69 [01:03<?, ?it/s]
Extracting : archspec-0.2.1-pyhd3eb1b0_0.conda: 1%|β | 1/69 [01:03<1:11:27, 63.04s/it]
Extracting : boltons-23.0.0-py39h06a4308_0.conda: 1%|β | 1/69 [02:14<1:11:27, 63.04s/it]
Extracting : boltons-23.0.0-py39h06a4308_0.conda: 3%|β | 2/69 [02:14<1:16:00, 68.07s/it]
Extracting : brotli-python-1.0.9-py39h6a678d5_7.conda: 3%|β | 2/69 [02:14<1:16:00, 68.07s/it]
Extracting : bzip2-1.0.8-h7b6447c_0.conda: 4%|β | 3/69 [02:14<1:14:52, 68.07s/it]
Extracting : c-ares-1.19.1-h5eee18b_0.conda: 6%|β | 4/69 [02:14<1:13:44, 68.07s/it]
Extracting : ca-certificates-2023.08.22-h06a4308_0.conda: 7%|β | 5/69 [02:14<1:12:36, 68.07s/it]
Extracting : certifi-2023.7.22-py39h06a4308_0.conda: 9%|β | 6/69 [02:14<1:11:28, 68.07s/it]
Extracting : cffi-1.15.1-py39h5eee18b_3.conda: 10%|β | 7/69 [02:14<1:10:20, 68.07s/it]
Extracting : charset-normalizer-2.0.4-pyhd3eb1b0_0.conda: 12%|ββ | 8/69 [02:14<1:09:12, 68.07s/it]
concurrent.futures.process._RemoteTraceback:
'''
Traceback (most recent call last):
File "concurrent/futures/process.py", line 387, in wait_result_broken_or_wakeup
File "multiprocessing/connection.py", line 256, in recv
TypeError: __init__() missing 1 required positional argument: 'msg'
'''
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "entry_point.py", line 69, in <module>
File "concurrent/futures/process.py", line 562, in _chain_from_iterable_of_lists
File "concurrent/futures/_base.py", line 609, in result_iterator
File "concurrent/futures/_base.py", line 446, in result
File "concurrent/futures/_base.py", line 391, in __get_result
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
[21328] Failed to execute script 'entry_point' due to unhandled exception!
2024-06-01 17:10:29,725 - testbed - ERROR - Error traceback: Traceback (most recent call last):
File "/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/context_manager.py", line 82, in __call__
output = subprocess.run(cmd, **combined_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['bash', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3/miniconda.sh', '-b', '-u', '-p', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3', '&&', 'conda', 'init', '--all']' returned non-zero exit status 1.
2024-06-01 17:10:29,728 - testbed - ERROR - Error traceback: Traceback (most recent call last):
File "/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/context_manager.py", line 82, in __call__
output = subprocess.run(cmd, **combined_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['bash', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/4.1/tmpu2up60g0/miniconda3/miniconda.sh', '-b', '-u', '-p', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/4.1/tmpu2up60g0/miniconda3', '&&', 'conda', 'init', '--all']' returned non-zero exit status 1.
2024-06-01 17:10:29,734 - testbed - ERROR - Error traceback: Traceback (most recent call last):
File "/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/context_manager.py", line 82, in __call__
output = subprocess.run(cmd, **combined_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['bash', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/sphinx/3.5/tmp0ges26wf/miniconda3/miniconda.sh', '-b', '-u', '-p', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/sphinx/3.5/tmp0ges26wf/miniconda3', '&&', 'conda', 'init', '--all']' returned non-zero exit status 1.
β Evaluation failed: Command '['bash',
'/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3/miniconda
.sh', '-b', '-u', '-p',
'/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3', '&&',
'conda', 'init', '--all']' returned non-zero exit status 1.
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.12/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/multiprocessing/pool.py", line 48, in mapstar
return list(map(*args))
^^^^^^^^^^^^^^^^
File "/mnt/c/Users/hrush/OneDrive - Student
Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/engine_evaluation.py", line 177, in main
setup_testbed(data_groups[0])
File "/mnt/c/Users/hrush/OneDrive - Student
Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/engine_validation.py", line 91, in
setup_testbed
with TestbedContextManager(
File "/mnt/c/Users/hrush/OneDrive - Student
Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/context_manager.py", line 285, in
__enter__
self.exec(install_cmd)
File "/mnt/c/Users/hrush/OneDrive - Student
Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/context_manager.py", line 95, in
__call__
raise e
File "/mnt/c/Users/hrush/OneDrive - Student
Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/context_manager.py", line 82, in
__call__
output = subprocess.run(cmd, **combined_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['bash',
'/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3/miniconda
.sh', '-b', '-u', '-p',
'/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3', '&&',
'conda', 'init', '--all']' returned non-zero exit status 1.
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/evaluation.py", line 72, in main
run_evaluation(
File "/mnt/c/Users/hrush/OneDrive - Student
Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/run_evaluation.py", line 203, in main
pool.map(eval_engine, eval_args)
File "/usr/lib/python3.12/multiprocessing/pool.py", line 367, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/multiprocessing/pool.py", line 774, in get
raise self._value
subprocess.CalledProcessError: Command '['bash',
'/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3/miniconda
.sh', '-b', '-u', '-p',
'/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3', '&&',
'conda', 'init', '--all']' returned non-zero exit status 1.
==================================
Log directory for evaluation run:
results/azure-gpt-3.5-turbo-1106__SWE-bench_Lite__default__t-0.00__p-0.95__c-3.00__install-1
- Wrote per-instance scorecards to
../trajectories/hrushi66/azure-gpt-3.5-turbo-1106__SWE-bench_Lite__default__t-0.00__p-0.95__c-3.00__install-1/scorecard
s.json
Reference Report:
- no_generation: 5
- generated: 3
- with_logs: 0
- install_fail: 0
- reset_failed: 0
- no_apply: 0
- applied: 0
- test_errored: 0
- test_timeout: 0
- resolved: 0
- Wrote summary of run to
../trajectories/hrushi66/azure-gpt-3.5-turbo-1106__SWE-bench_Lite__default__t-0.00__p-0.95__c-3.00__install-1/results.j
son
i have this error haunting me since 3 hours, can anyone help @klieret
Regarding your initial report: I believe you need to specify the dataset name or path as second argument (the default is princeton-nlp/SWE-bench, which is probably not what you need here).
Let me ping @carlosejimenez and @john-b-yang for this issue.
Hi @Hk669 Just to be clear, I noticed that you used a "test-repo" in your examples. I'm not sure if that's just a placeholder, but generally the evaluation process will only work with examples already in the SWE-bench dataset. This is because we have the correct tests and behavior logged for those instances only.
However, the other command you mentioned: ./run_eval.sh ../trajectories/hrushi66/azure-gpt-3.5-turbo-1106__SWE-bench_Lite__default__t-0.00__p-0.95__c-3.00__install-1/all_preds.jsonl
does seem like an issue. Can you confirm the version of swebench that you're using? Can you make sure to use the latest version of the repository/package?
Hi I am having a similar issue, I do not know where to get the SWE-bench-lite dataset.
Here is what I'm running:
#!/bin/bash
python run_evaluation.py
--predictions_path "../../../SWE-agent/trajectories/vscode/gpt4__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/all_preds.jsonl"
--swe_bench_tasks "SWE-bench_Lite/data/dev-00000-of-00001.parquet"
--log_dir "logs"
--testbed "testbed"
--skip_existing
--timeout 900
--verbose
I also tried to convert the data to a .json file with a python script, but that also did not work. Could you help direct me to the dataset?
I ran SWE-agent with the regular command.
python run.py --model_name gpt4
--per_instance_cost_limit 2.00
--config_file ./config/default.yaml