SWE-bench icon indicating copy to clipboard operation
SWE-bench copied to clipboard

PASS_TO_PASS tests failing on original program

Open yuntongzhang opened this issue 11 months ago • 1 comments

Thank you for creating the dataset!

I was running scripts on the benchmark, and realized that sometimes the PASS_TO_PASS tests fail on the original version of the subject when no patch was applied.

For example, in matplotlib__matplotlib-24334, upon checking out the base_commit, entering the conda env, and performing install commands, executing the test_cmd pytest --no-header -rA --tb=no -p no:cacheprovider lib/matplotlib/tests/test_axes.py results in some test failures on my system (ubuntu 20.04 LTS). The following tests failed and are in the PASS_TO_PASS list of this instance:

FAILED lib/matplotlib/tests/test_axes.py::test_hist2d[png] - matplotlib.testing.exceptions.ImageComparisonFailure: images not close (RMS 5.559):
FAILED lib/matplotlib/tests/test_axes.py::test_hist2d[pdf] - matplotlib.testing.exceptions.ImageComparisonFailure: images not close (RMS 142.950):
FAILED lib/matplotlib/tests/test_axes.py::test_hist2d[svg] - matplotlib.testing.exceptions.ImageComparisonFailure: images not close (RMS 4.493):
FAILED lib/matplotlib/tests/test_axes.py::test_hist2d_transpose[pdf] - matplotlib.testing.exceptions.ImageComparisonFailure: images not close (RMS 155.125):

These tests seem to test for histogram generation. I am not sure why they fail, but maybe some system-level dependencies were missing / having wrong version on my system.

May I know whether these tests could pass on your system? Since some of the projects in the benchmark require quite a number of system-level dependencies, it might be good to have a Docker environment. Also, for the tests that are not very related to the target issue but have unstable behavior due to the host environment, would you consider removing them from the benchmark?

Thank you for looking into this!

yuntongzhang avatar Apr 01 '24 14:04 yuntongzhang