Remove pytest pin
Description
Contributes to https://github.com/rapidsai/build-planning/issues/105.
Checklist
- [x] I am familiar with the Contributing Guidelines.
- [x] New or existing tests cover these changes.
- [ ] The documentation is up to date with these changes.
The failure I see here looks like https://github.com/pytest-dev/pytest-cov/issues/693 (cf. https://github.com/pytest-dev/pytest-xdist/issues/1211).
I resolved the above issue by ignoring one error. The other failure looks to be due to our testing of the streaming executor for cudf-polars using dask in some way that is conflicting with the latest versions of pytest and xdist.
Going to note down a few things here that should not block this PR, but we might want to track moving forward (I'll open issues once I'm sure I've got a handle on all of this):
- Many of the test runs in
ci/test_python_cudf.shpass the--cov-configargument onward torun_cudf_pytests.sh. It is not clear to me if all of thesecoveragercfiles even exist, and there's not an immediately apparent way to determine which file (if any) is being used to setcoveragesettings for a given subproject.
This is made trickier because coverage will use the first vaild config file that it finds, starting with .coveragerc, then setup.cfg, tox.ini, then pyproject.toml.
It would be good to standardize on one choice (probably pyproject.toml) except for those cases where we deliberately need to override settings for a particular test run.
One of the remaining issues is that it seems like more recent versions of pytest are making it harder to guarantee that the current path is not added to sys.path. Even when I use --import-mode=importlib CI is failing for pylibcudf tests because some files from the source tree for pylibcudf are being imported instead of the versions in the install tree. --import-mode=append gets past this issue, but then fails for us because pylibcudf has multiple test modules with the same name in different subdirectories. I think moving the tests out of the pylibcudf package directory is the best way to avoid this problem altogether.
@vyasr -- I think I've got all the pytest errors squashed here. The pandas-test job is still failing, but looks more like legitimate test failures (although I don't know why)
After some offline discussion I added a few extra tests to the list of expected failures for pandas. It seems likely that the pandas version we are currently using has some issues with the latest pytest version. Those issues are likely fixed in the latest pandas release, so we can revisit these when we tackle #19099.
I can push up a few changes to address the review comments (and fix the bad pipefail)
Going to merge this once CI passes again, because it moves a lot of code and is likely to have merge conflicts (like the one I just resolved).
/merge
One set of tests is failing because of a behavior difference between .str.contains(..., na=None) and .str.contains(... na=""). Both raise NotImplementedError, but with na=None we emit a warning first. Which do we want?
In [1]: import cudf
In [2]: s = cudf.Series(['a', 'b', 'c'])
In [3]: s.str.contains('a', flags=False, na=None, regex=False)
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
Cell In[3], line 1
----> 1 s.str.contains('a', flags=False, na=None, regex=False)
File /raid/toaugspurger/envs/rapidsai/cudf/25.08/lib/python3.12/site-packages/cudf/core/accessors/string.py:750, in StringMethods.contains(self, pat, case, flags, na, regex)
744 warnings.warn(
745 "Allowing a non-bool 'na' in obj.str.contains is deprecated "
746 "and will raise in a future version.",
747 FutureWarning,
748 )
749 if na not in {no_default, np.nan}:
--> 750 raise NotImplementedError("`na` parameter is not yet supported")
751 if regex and isinstance(pat, re.Pattern):
752 flags = pat.flags & ~re.U
NotImplementedError: `na` parameter is not yet supported
In [4]: s.str.contains('a', flags=False, na='', regex=False)
/raid/toaugspurger/envs/rapidsai/cudf/25.08/lib/python3.12/site-packages/cudf/core/accessors/string.py:744: FutureWarning: Allowing a non-bool 'na' in obj.str.contains is deprecated and will raise in a future version.
warnings.warn(
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
Cell In[4], line 1
----> 1 s.str.contains('a', flags=False, na='', regex=False)
File /raid/toaugspurger/envs/rapidsai/cudf/25.08/lib/python3.12/site-packages/cudf/core/accessors/string.py:750, in StringMethods.contains(self, pat, case, flags, na, regex)
744 warnings.warn(
745 "Allowing a non-bool 'na' in obj.str.contains is deprecated "
746 "and will raise in a future version.",
747 FutureWarning,
748 )
749 if na not in {no_default, np.nan}:
--> 750 raise NotImplementedError("`na` parameter is not yet supported")
751 if regex and isinstance(pat, re.Pattern):
752 flags = pat.flags & ~re.U
NotImplementedError: `na` parameter is not yet supported
The rest of the failures seem to be from a changed warning message. Overall, this set of failures looks like pytest 8 has uncovered some real issues for us.
The second failure is for cudf.pandas and weakrefs:
from pandas import DataFrame
import weakref
df = DataFrame({"a": [0, 1], "b": [2, 3]})
for name in ("loc", "iloc", "at", "iat"):
getattr(df, name)
wr = weakref.ref(df)
del df
assert wr() is None
If I add a gc.get_referrers(wr()) I see [<cell at 0x7f0098ce3f10: DataFrame object at 0x7f00b41f1f40>] is the thing referencing df.
Note that inserting a gc.collect() fixes the failure, so this isn't a memory leak. Just a cyclic reference somewhere delaying collection of df until the gc runs.
I've pushed a couple of test-only fixes, but if we decide to change the behavior in https://github.com/rapidsai/cudf/pull/19127#issuecomment-3046431952 then I'll push a fix here or handle that in a separate PR. I've assumed for now that the weakref.ref(pandas.DataFrame) is expected / unavoidable and have just skipped that test.
One set of tests is failing because of a behavior difference between
.str.contains(..., na=None)and.str.contains(... na=""). Both raiseNotImplementedError, but withna=Nonewe emit a warning first. Which do we want?In [1]: import cudf In [2]: s = cudf.Series(['a', 'b', 'c']) In [3]: s.str.contains('a', flags=False, na=None, regex=False) --------------------------------------------------------------------------- NotImplementedError Traceback (most recent call last) Cell In[3], line 1 ----> 1 s.str.contains('a', flags=False, na=None, regex=False) File /raid/toaugspurger/envs/rapidsai/cudf/25.08/lib/python3.12/site-packages/cudf/core/accessors/string.py:750, in StringMethods.contains(self, pat, case, flags, na, regex) 744 warnings.warn( 745 "Allowing a non-bool 'na' in obj.str.contains is deprecated " 746 "and will raise in a future version.", 747 FutureWarning, 748 ) 749 if na not in {no_default, np.nan}: --> 750 raise NotImplementedError("`na` parameter is not yet supported") 751 if regex and isinstance(pat, re.Pattern): 752 flags = pat.flags & ~re.U NotImplementedError: `na` parameter is not yet supported In [4]: s.str.contains('a', flags=False, na='', regex=False) /raid/toaugspurger/envs/rapidsai/cudf/25.08/lib/python3.12/site-packages/cudf/core/accessors/string.py:744: FutureWarning: Allowing a non-bool 'na' in obj.str.contains is deprecated and will raise in a future version. warnings.warn( --------------------------------------------------------------------------- NotImplementedError Traceback (most recent call last) Cell In[4], line 1 ----> 1 s.str.contains('a', flags=False, na='', regex=False) File /raid/toaugspurger/envs/rapidsai/cudf/25.08/lib/python3.12/site-packages/cudf/core/accessors/string.py:750, in StringMethods.contains(self, pat, case, flags, na, regex) 744 warnings.warn( 745 "Allowing a non-bool 'na' in obj.str.contains is deprecated " 746 "and will raise in a future version.", 747 FutureWarning, 748 ) 749 if na not in {no_default, np.nan}: --> 750 raise NotImplementedError("`na` parameter is not yet supported") 751 if regex and isinstance(pat, re.Pattern): 752 flags = pat.flags & ~re.U NotImplementedError: `na` parameter is not yet supportedThe rest of the failures seem to be from a changed warning message. Overall, this set of failures looks like pytest 8 has uncovered some real issues for us.
I think this is expected behavior and updates to the test is the correct approach. Thanks @TomAugspurger !
Added a "DO NOT MERGE" temporarily till Vyas has a chance to chime in on my proposed changes.
One failure in the cudf.pandas tests:
FAILED python/cudf/cudf_pandas_tests/test_cudf_pandas.py::test_rmm_option_on_import[cuda] - AssertionError: assert 1 == 0
+ where 1 = CompletedProcess(args=['python', '-m', 'cudf.pandas', '/__w/cudf/cudf/python/cudf/cudf_pandas_tests/data/profile_basic...mory\nRuntimeError: CUDA error at: /__w/rmm/rmm/cpp/src/cuda_device.cpp:46: cudaErrorMemoryAllocation out of memory\n').returncode
This is different than the one that I was trying to debug with https://github.com/rapidsai/cudf/pull/19127/commits/7ccdaf5b480cda5de9e2ca3ab88e8e9c1a7d0bc6.