Error while reproduction the results on SWE-bench

Open junleiz opened this issue 4 months ago • 0 comments

Instance pytest-dev__pytest-7236 - 2025-08-03 21:23:34,716 - INFO - Starting evaluation for instance pytest-dev__pytest-7236. Hint: run "tail -f evaluation/evaluation_outputs/outputs/princeton-nlp__SWE-bench_Verified-test/CodeActAgent/qwen3-coder_maxiter_100_N_v0.44.0-no-hint-princeton-nlp/SWE-bench_Verified_test_llm.qwen3-coder_workers_32_eval_limit_500_max_iter_100_n_runs_3_mode_swe-run_1/infer_logs/instance_pytest-dev__pytest-7236.log" to see live logs in a separate shell Instance pytest-dev__pytest-7236 - 2025-08-03 23:10:10,594 - ERROR - ---------- Error in instance [pytest-dev__pytest-7236]: Failed to cd to /workspace/pytest-dev__pytest__5.4: CmdOutputObservation (source=None, exit code=-1, metadata={ "exit_code": -1, "pid": -1, "username": null, "hostname": null, "working_dir": null, "py_interpreter_path": null, "prefix": "[Below is the output of the previous command.]\n", "suffix": "\n[Your command "cd /workspace/pytest-dev__pytest__5.4" is NOT executed. The previous command is still running - You CANNOT send new commands until the previous command is completed. By setting is_input to true, you can interact with the current process: You may wait longer to see additional output of the previous command by sending empty command '', send other commands to interact with the current process, or send keys ("C-c", "C-z", "C-d") to interrupt/kill the previous command before sending your new command.]" }) --BEGIN AGENT OBSERVATION-- [Below is the output of the previous command.]

[Your command "cd /workspace/pytest-dev__pytest__5.4" is NOT executed. The previous command is still running - You CANNOT send new commands until the previous command is completed. By setting is_input to true, you can interact with the current process: You may wait longer to see additional output of the previous command by sending empty command '', send other commands to interact with the current process, or send keys ("C-c", "C-z", "C-d") to interrupt/kill the previous command before sending your new command.] --END AGENT OBSERVATION--. Stacktrace: Traceback (most recent call last): File "/home/zhangjunlei/code/willfarm/OpenHands/evaluation/utils/shared.py", line 403, in _process_instance_wrapper result = process_instance_func(instance, metadata, use_mp, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zhangjunlei/code/willfarm/OpenHands/evaluation/benchmarks/swe_bench/run_infer.py", line 727, in process_instance return_val = complete_runtime_fn(runtime, instance) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zhangjunlei/code/willfarm/OpenHands/evaluation/benchmarks/swe_bench/run_infer.py", line 547, in complete_runtime assert_and_raise( File "/home/zhangjunlei/code/willfarm/OpenHands/evaluation/utils/shared.py", line 310, in assert_and_raise raise EvalException(msg) evaluation.utils.shared.EvalException: Failed to cd to /workspace/pytest-dev__pytest__5.4: CmdOutputObservation (source=None, exit code=-1, metadata={ "exit_code": -1, "pid": -1, "username": null, "hostname": null, "working_dir": null, "py_interpreter_path": null, "prefix": "[Below is the output of the previous command.]\n", "suffix": "\n[Your command "cd /workspace/pytest-dev__pytest__5.4" is NOT executed. The previous command is still running - You CANNOT send new commands until the previous command is completed. By setting is_input to true, you can interact with the current process: You may wait longer to see additional output of the previous command by sending empty command '', send other commands to interact with the current process, or send keys ("C-c", "C-z", "C-d") to interrupt/kill the previous command before sending your new command.]" }) --BEGIN AGENT OBSERVATION-- [Below is the output of the previous command.]

----------[The above error occurred. Retrying... (attempt 1 of 5)]---------- Traceback (most recent call last): File "/home/zhangjunlei/code/willfarm/OpenHands/evaluation/utils/shared.py", line 403, in _process_instance_wrapper result = process_instance_func(instance, metadata, use_mp, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zhangjunlei/code/willfarm/OpenHands/evaluation/benchmarks/swe_bench/run_infer.py", line 727, in process_instance return_val = complete_runtime_fn(runtime, instance) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zhangjunlei/code/willfarm/OpenHands/evaluation/benchmarks/swe_bench/run_infer.py", line 547, in complete_runtime assert_and_raise( File "/home/zhangjunlei/code/willfarm/OpenHands/evaluation/utils/shared.py", line 310, in assert_and_raise raise EvalException(msg) evaluation.utils.shared.EvalException: Failed to cd to /workspace/pytest-dev__pytest__5.4: CmdOutputObservation (source=None, exit code=-1, metadata={ "exit_code": -1, "pid": -1, "username": null, "hostname": null, "working_dir": null, "py_interpreter_path": null, "prefix": "[Below is the output of the previous command.]\n", "suffix": "\n[Your command "cd /workspace/pytest-dev__pytest__5.4" is NOT executed. The previous command is still running - You CANNOT send new commands until the previous command is completed. By setting is_input to true, you can interact with the current process: You may wait longer to see additional output of the previous command by sending empty command '', send other commands to interact with the current process, or send keys ("C-c", "C-z", "C-d") to interrupt/kill the previous command before sending your new command.]" }) --BEGIN AGENT OBSERVATION-- [Below is the output of the previous command.]

I just try to reproduce the perf of qwen3-coder on SWE-bench. I follow the instruction of running run_infer.sh without changing much.

Aug 03 '25 15:08 junleiz