Albert Örwall
Albert Örwall
### Describe the feature I've been working on building Docker images for all testbeds used in SWE-Bench. This works quite well even if I still haven't got failing 18 benchmark...
#### Reference Issues/PRs Partly solves #104 #### What does this implement/fix? Explain your changes. This change is to make it possible to reuse conda environments when running evaluation. * The...
I have evaluated your predictions using my [Docker based swe-bench evaluator](https://github.com/aorwall/SWE-bench-docker/tree/main). I achieve 26% on pass@3 compared to the 22% you reported. It might be worthwhile to review the logs...
Great job with the new containerized evaluation tool! I've run it a couple of times on the golden patches on SWE-bench Lite and overall it gives a more stable result...