SWE-bench Issues in the test case parsing logic from the logs

Firstly, thanks for this useful dataset.

Some of the pytest logs have a format such as this:

PASSED sklearn/feature_extraction/tests/test_text.py::test_callable_analyzer_error[file-AttributeError-'str' object has no attribute 'read'-CountVectorizer]
PASSED sklearn/feature_extraction/tests/test_text.py::test_callable_analyzer_error[file-AttributeError-'str' object has no attribute 'read'-TfidfVectorizer]

While parsing which test cases passed, the current code is splitting the line by spaces and taking the second token. As a result, :

test_callable_analyzer_error[file-AttributeError-'str' object has no attribute 'read'-CountVectorizer maps to sklearn/feature_extraction/tests/test_text.py::test_callable_analyzer_error[file-AttributeError-'str'
test_callable_analyzer_error[file-AttributeError-'str' object has no attribute 'read'-TfidfVectorizer maps to sklearn/feature_extraction/tests/test_text.py::test_callable_analyzer_error[file-AttributeError-'str'

These erroneous mappings have also been captured in the dataset such as for scikit-learn__scikit-learn-14430, scikit-learn__scikit-learn-13554 and several others.

The correction requires very minor changes in the code. Posting here for others using the dataset.

Jan 24 '24 14:01 anmolagarwal999

Hi @anmolagarwal999 thanks for your comment and patience.

I just took a look, you are correct. Thanks for noticing this. If you think you have a fix, would you mind proposing a PR? Based on some manual inspection, it seems to me that most of these parsing issues are isolated to the scikit-learn repo, but curious to know what you think.

Apr 15 '24 23:04 john-b-yang

Thanks for the suggestions in #50. I will try to code posted in that PR and see whether it resolves some of the test case parsing issues that have been reported.

Based on an initial diagnosis, it looks like this issue mostly affects P2P tests. I'll continue working on resolving this and post an update when the code is fixed + if/when the dataset is updated.

Apr 16 '24 16:04 john-b-yang

@john-b-yang Additionally, there are certain cases where the FAIL_TO_PASS testcases have the format: "function name"(absolute path). Eg: test_cal[/home/anmol/swebench/exp/1] and since this absolute path may vary across different machines, even the gold patch fails.

I kind of rewrote the parsing pipeline by using the pytest-json extension. This enables the pytest logs to be outputted as a dictionary and makes it much easier to parse and handle edge cases such as the one I mentioned above + here + here .

Introducing this careful parsing using pytest-json did cause any regressions to the original pipeline as far as I could tell. If you are open to using the pytest-json extension, I can try to make a PR.

Apr 16 '24 16:04 anmolagarwal999

I see, the absolute path folder case should be dealt with at this point, it is failure mode 5 in our report.

At this time, I'll be focusing more on addressing any conda / installation issues that people are still seeing after the report, but I'll be sure to get around to fixing the parsing by the end of the month (if you have suggested fixes you wouldn't mind sharing via a PR, that would be greatly appreciated!)

Apr 16 '24 16:04 john-b-yang

Hey @anmolagarwal999, thanks again for creating this issue! There's been a good amount of progress since the original creation of the issue, and I believe these errors should have been resolved by a combination of contributions leading up to the latest big push #142.

If issues remain, I think it'd be more helpful at this point to start a new issue, and we can discuss more there. However, the log parsing has been updated for scikit-learn such that this shouldn't be an issue anymore.

Jul 02 '24 19:07 john-b-yang

SWE-bench SWE-bench copied to clipboard

Issues in the test case parsing logic from the logs

SWE-bench
SWE-bench copied to clipboard