SWE-bench issues

Results 69 SWE-bench issues

Sort by recently updated

Allow conda directory to be specified in run_evaluation.py

Enhancements to run_evaluation.py: This pull request adds an optional command-line argument `--path_conda` to the run_evaluation.py script. The main functionality enhancement is the ability to specify a custom path to a...

Jonty800

Request for Baseline Model Testing Log Files for Research Purposes

I would be extremely grateful if you could share the log files from the baseline model testing process discussed in your study. Access to these logs, specifically those with names...

yczhou001

What are expected to submit for the leaderboard integration?

Check https://www.swebench.com and found: ![image](https://github.com/princeton-nlp/SWE-bench/assets/8592144/6c2e7ed9-3bc4-4046-a89f-a3f6d64a2533)

zhimin-z

Reproducible Experiments via MLFlow or Similar [FeatureRequest]

It would be nice to have a "Experiments" directory for reproducible research via MLflow or a similar tool.

moresearch

May I ask where can I download the generated results from Claude and GPTs?

I have read your paper and really like this work. Could I ask where can I download the generated results from Claude and GPTs? These results are beneficial to our...

itaowei

Running SWE-Llama(13B/7B) on HuggingFace Endpoints [FeatureRequest]

[HuggingFace Endpoints](https://huggingface.co/inference-endpoints/dedicated) seems to be an easier way to run SWE-Llama models on the cloud. Tips for coding this feature?

moresearch

Tests that should fail don't fail.

Thank you for providing a great data set. I have tested it in my environment and confirmed that the fail to pass test does not fail in the following instances,...

t-kurabayashi

evaluation

Issues in the test case parsing logic from the logs

Firstly, thanks for this useful dataset. Some of the pytest logs have a format such as this: ``` PASSED sklearn/feature_extraction/tests/test_text.py::test_callable_analyzer_error[file-AttributeError-'str' object has no attribute 'read'-CountVectorizer] PASSED sklearn/feature_extraction/tests/test_text.py::test_callable_analyzer_error[file-AttributeError-'str' object has no...

anmolagarwal999

evaluation

logs are unusable with multiple test instances

`harness/run_evaluation.py` takes a `--log_dir` argument but if there are multiple predictions for a single model and test in the `--predictions_path` file they all write to a single file in the...

JasonGross

sometimes gold_patch cannot pass the test

This is a very challenging benchmark, I have learned a lot from it. Thank you for the effort you have put into this. I tested using the swe-llama13b you provided...

LuoKaiGSW

evaluation

SWE-bench
SWE-bench copied to clipboard

Metadata

Allow conda directory to be specified in run_evaluation.py

Request for Baseline Model Testing Log Files for Research Purposes

What are expected to submit for the leaderboard integration?

Reproducible Experiments via MLFlow or Similar [FeatureRequest]

May I ask where can I download the generated results from Claude and GPTs?

Running SWE-Llama(13B/7B) on HuggingFace Endpoints [FeatureRequest]

Tests that should fail don't fail.

Issues in the test case parsing logic from the logs

logs are unusable with multiple test instances

sometimes gold_patch cannot pass the test

← Metadata

Owner

Metadata

SWE-bench SWE-bench copied to clipboard

Metadata

← Metadata

Owner

Metadata

SWE-bench
SWE-bench copied to clipboard