SWE-bench
SWE-bench copied to clipboard
Reproducer Docker image
Describe the feature
Hi! Thanks for all the work, after the 04/15 patch I can now reproduce most of the SWE-bench instances using the default harness. However, I'm still having trouble with (at least) Flask and Scikit-Learn, where environments fail to be initialized bc of what I suspect is a Cython version mismatch. This fails even in a clean-slate Docker environment (example attached).
However, in your Repair Report (https://github.com/princeton-nlp/SWE-bench/blob/main/docs/20240415_eval_bug/README.md) you mentioned you have successfully reproduced evaluation of the whole dataset. So either I'm doing something uniquely wrong, or the process still depends on the host environment and the environment you're using is unique in some way. I'd like to figure out which one of these is the case :) It would be great if you could share more operational details about your test running process - the environment, the exact scripts or ideally even a Docker image that does it.
Thanks!
Repro of my failing attempt to set up the harness for scikit-learn
:
- Test script:
from swebench.harness.context_manager import TestbedContextManager
from swebench.metrics.getters import get_eval_refs
if __name__ == "__main__":
insts = get_eval_refs("princeton-nlp/SWE-bench")
# only take scikit-learn for repro
insts = {k: v for (k, v) in insts.items() if v["repo"].endswith("scikit-learn")}
# simply create the context manager
# Note: leaving both `conda_link` and `path_conda` empty to use the default logic, whatever it is
tcm = TestbedContextManager(
list(insts.values()),
"/tmp/swebench_logs",
testbed=str("/tmp/swebench_eval_dir/testbed"),
)
# just enter it and print all tasks
with tcm:
distributed_task_list = tcm.get_distributed_tasks()
for task_list in distributed_task_list:
print(
f"{task_list['testbed']}: {len(task_list['task_instances'])} instances"
)
- Dockerfile:
FROM continuumio/miniconda3
WORKDIR /workdir
RUN git clone https://github.com/princeton-nlp/SWE-bench /workdir
RUN conda env create -f environment.yml
RUN echo "conda activate swe-bench" >> ~/.bashrc
# pre-cache the SWE-bench HF dataset to avoid re-downloading it every time
RUN conda run -n swe-bench python -c 'from swebench.metrics.getters import get_eval_refs; get_eval_refs("princeton-nlp/SWE-bench")'
COPY test_script.py test_script.py
- Command:
docker run $(docker build --quiet .) bash -c ". activate swe-bench && python test_script.py"
- Output with error: (collapsed below)
× Preparing metadata (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [54 lines of output] Running from numpy source directory. setup.py:470: UserWarning: Unrecognized setuptools command, proceeding with generating Cython sources and expanding templates run_build = parse_setuppy_commands()
Error compiling Cython file:
------------------------------------------------------------
...
cdef sfc64_state rng_state
def __init__(self, seed=None):
BitGenerator.__init__(self, seed)
self._bitgen.state = <void *>&self.rng_state
self._bitgen.next_uint64 = &sfc64_uint64
^
------------------------------------------------------------
_sfc64.pyx:90:35: Cannot assign type 'uint64_t (*)(void *) except? -1 nogil' to 'uint64_t (*)(void *) noexcept nogil'. Exception values are incompatible. Suggest adding 'noexcept' to the type of the value being assigned.
Processing numpy/random/_bounded_integers.pxd.in
Processing numpy/random/_sfc64.pyx
Traceback (most recent call last):
File "/tmp/pip-install-xgdcfda2/numpy_aa0333d333154dcb80e15351c222fe81/tools/cythonize.py", line 235, in <module>
main()
File "/tmp/pip-install-xgdcfda2/numpy_aa0333d333154dcb80e15351c222fe81/tools/cythonize.py", line 231, in main
find_process_files(root_dir)
File "/tmp/pip-install-xgdcfda2/numpy_aa0333d333154dcb80e15351c222fe81/tools/cythonize.py", line 222, in find_process_files
process(root_dir, fromfile, tofile, function, hash_db)
File "/tmp/pip-install-xgdcfda2/numpy_aa0333d333154dcb80e15351c222fe81/tools/cythonize.py", line 188, in process
processor_function(fromfile, tofile)
File "/tmp/pip-install-xgdcfda2/numpy_aa0333d333154dcb80e15351c222fe81/tools/cythonize.py", line 77, in process_pyx
subprocess.check_call(
File "/tmp/tmpy54mprlj/miniconda3/lib/python3.9/subprocess.py", line 373, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/tmp/tmpy54mprlj/miniconda3/bin/python', '-m', 'cython', '-3', '--fast-fail', '-o', '_sfc64.c', '_sfc64.pyx']' returned non-zero exit status 1.
Cythonizing sources
Traceback (most recent call last):
File "/tmp/tmpy54mprlj/miniconda3/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
main()
File "/tmp/tmpy54mprlj/miniconda3/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/tmp/tmpy54mprlj/miniconda3/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 149, in prepare_metadata_for_build_wheel
return hook(metadata_directory, config_settings)
File "/tmp/pip-build-env-s9yvqxy1/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 157, in prepare_metadata_for_build_wheel
self.run_setup()
File "/tmp/pip-build-env-s9yvqxy1/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 248, in run_setup
super(_BuildMetaLegacyBackend,
File "/tmp/pip-build-env-s9yvqxy1/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 142, in run_setup
exec(compile(code, __file__, 'exec'), locals())
File "setup.py", line 499, in <module>
setup_package()
File "setup.py", line 479, in setup_package
generate_cython()
File "setup.py", line 274, in generate_cython
raise RuntimeError("Running cythonize failed!")
RuntimeError: Running cythonize failed!
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed
× Encountered error while generating package metadata. ╰─> See above for output.
note: This is an issue with the package mentioned above, not pip. hint: See above for details.
2024-05-01 23:48:09,346 - testbed - ERROR - Error traceback: Traceback (most recent call last): File "/workdir/swebench/harness/context_manager.py", line 82, in call output = subprocess.run(cmd, **combined_args) File "/opt/conda/envs/swe-bench/lib/python3.9/subprocess.py", line 528, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '. /tmp/tmpy54mprlj/miniconda3/bin/activate scikit-learn__scikit-learn__0.20 && pip install numpy==1.19.2 scipy==1.5.2' returned non-zero exit status 1.
Traceback (most recent call last):
File "/workdir/test_script.py", line 18, in
</details>
### Potential Solutions
Would it be possible for you to include a full command/script that, when run on a clean environment, will set up each instance and confirm that the golden solution correctly solves it?