cudf CI: Test against old versions of key dependencies

This adds explicit tests with old versions of key dependencies. Specifically:

numba==0.57
numpy==1.23
pandas==2.0
~fsspec==0.6.0~ excluded it. transformers==4.39.3 requires huggingface_hub which requires fsspec>=2023.5.0. In principle one could include it e.g. only for conda which doesn't pull in transformers, but that seemed not worth the trouble?
cupy==12.0.0
pyarrow==16.1.0

See also https://github.com/rapidsai/build-planning/issues/81

(Marking as draft until I see that things work.)

Aug 15 '24 13:08 seberg

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Aug 16 '24 11:08 copy-pr-bot[bot]

@jameslamb maybe you can have a look w.r.t. to the setup/yaml? One of the builds currently failed, but I doubt a bit that it is related to the change (because build steps should be unaffected).

@mroeschke pinging maybe you know well what to do with the failures already showing up? Many seem just simple but e.g. this run has a surprising amount of failures.

(Have to check that this doesn't accidentally drop running "latest" tests for any setup.)

EDIT: Hmmm, I guess https://github.com/rapidsai/cudf/pull/15100 might be a good start for things that need to be readded if running <2.2.

Aug 16 '24 13:08 seberg

@jameslamb maybe you can have a look w.r.t. to the setup/yaml? One of the builds currently failed, but I doubt a bit that it is related to the change (because build steps should be unaffected).

At a quick glance, changes look like what I'd expect. Nice trick including cupy== alongside cupy-cu{major}x together! I agree with you, that's a nice advantage of --constraint over --requirement.

If you fix conflicts + apply the one small suggestion I left, hopefully you'll see builds pass here. Once you've done that, @ me and I'd be happy to review again and more carefully.

Aug 19 '24 16:08 jameslamb

@seberg I just merged #16575 which is going to create even more merge conflicts for you, sorry 😬

Could you try to follow the patterns from that PR? They're a little bit stricter and safer than what we had before in the case where a CI job is installing multiple other CI artifacts (e.g. cudf_polars jobs installing cudf, pylibcudf, and cudf_polars wheels).

Aug 19 '24 21:08 jameslamb

pinging maybe you know well what to do with the failures already showing up?

From a glance the flavors of errors appears to be:

A potential bug/behavior that existed in pandas 2.0 that is different on pandas 2.2.x (which cudf tries to align with)
A new API that was introduced post pandas 2.0 (I recently made cudf APIs align with pandas 2.2.x)

IIRC from offline discussion of testing multiple pandas versions that we didn't want to overly complicate the test suite with version checking due to discrepancies in minor versions. I suspect we can largely use this pytest mark pattern for these:

@pytest.mark.skipif(
    PANDAS_VERSION < PANDAS_CURRENT_SUPPORTED_VERSION,
    reason="Fails in older versions of pandas",
)

Aug 21 '24 01:08 mroeschke

This should be pretty much ready, although with those very heavy handed xfails, also included some changes from https://github.com/rapidsai/cudf/pull/16595.

I did leave 2-3 failures (and may have missed some), because I wasn't sure if they are OK to silence. One error seemed just very wrong in orc and another was an nvjit compilation failure IIRC.

I would appreciate if someone could take over the last bit of dealing with the test suite here. I am happy to update the CI setup if there are issue.

Aug 21 '24 15:08 seberg

I would appreciate if someone could take over the last bit of dealing with the test suite here

Sure, I can finish off the last few failures in the test suite tomorrow

Aug 21 '24 23:08 mroeschke

Posting the remaining error, since maybe someone has a better idea where it may be coming from:

=================================== FAILURES ===================================
_ test_numba_mvc[\nimport numba.cuda\nimport cudf\nfrom cudf.utils._numba import _CUDFNumbaConfig\nfrom pynvjitlink.patch import (\n    patch_numba_linker as patch_numba_linker_pynvjitlink,\n)\n\npatch_numba_linker_pynvjitlink()\n\[email protected]\ndef test_kernel(x):\n    id = numba.cuda.grid(1)\n    if id < len(x):\n        x[id] += 1\n\ns = cudf.Series([1, 2, 3])\nwith _CUDFNumbaConfig():\n    test_kernel.forall(len(s))(s)\n] _
[gw1] linux -- Python 3.10.14 /pyenv/versions/3.10.14/bin/python

test = '\nimport numba.cuda\nimport cudf\nfrom cudf.utils._numba import _CUDFNumbaConfig\nfrom pynvjitlink.patch import (\n  ...n(x):\n        x[id] += 1\n\ns = cudf.Series([1, 2, 3])\nwith _CUDFNumbaConfig():\n    test_kernel.forall(len(s))(s)\n'

    @pytest.mark.parametrize(
        "test",
        [
            <snip is cuda 12>
            pytest.param(
                CUDA_12_TEST,
                marks=pytest.mark.skipif(
                    not IS_CUDA_12 or not HAVE_PYNVJITLINK,
                    reason="Minor Version Compatibility test for CUDA 12",
                ),
            ),
        ],
    )
    def test_numba_mvc(test):
        cp = subprocess.run(
            [sys.executable, "-c", test],
            capture_output=True,
            cwd="/",
        )
    
>       assert cp.returncode == 0
E       assert 1 == 0
E        +  where 1 = CompletedProcess(args=['/pyenv/versions/3.10.14/bin/python', '-c', '\nimport numba.cuda\nimport cudf\nfrom cudf.utils.... line 63, in PTXSource\n    kind = FILE_EXTENSION_MAP["ptx"]\nNameError: name \'FILE_EXTENSION_MAP\' is not defined\n').returncode

test_mvc.py:99: AssertionError
-------- generated xml file: /__w/cudf/cudf/test-results/junit-cudf.xml --------
=========================== short test summary info ============================
FAILED test_mvc.py::test_numba_mvc[\nimport numba.cuda\nimport cudf\nfrom cudf.utils._numba import _CUDFNumbaConfig\nfrom pynvjitlink.patch import (\n    patch_numba_linker as patch_numba_linker_pynvjitlink,\n)\n\npatch_numba_linker_pynvjitlink()\n\[email protected]\ndef test_kernel(x):\n    id = numba.cuda.grid(1)\n    if id < len(x):\n        x[id] += 1\n\ns = cudf.Series([1, 2, 3])\nwith _CUDFNumbaConfig():\n    test_kernel.forall(len(s))(s)\n] - assert 1 == 0
 +  where 1 = CompletedProcess(args=['/pyenv/versions/3.10.14/bin/python', '-c', '\nimport numba.cuda\nimport cudf\nfrom cudf.utils.... line 63, in PTXSource\n    kind = FILE_EXTENSION_MAP["ptx"]\nNameError: name \'FILE_EXTENSION_MAP\' is not defined\n').returncode
==== 1 failed, 96131 passed, 8288 skipped, 886 xfailed in 762.15s (0:12:42) ====

Aug 23 '24 12:08 seberg

@brandon-b-miller it appears you added the unit test in https://github.com/rapidsai/cudf/pull/16570#issuecomment-2307038325 in https://github.com/rapidsai/cudf/pull/13650. Do you happen to know if this test would be impacted by using an older version of numba (0.57) and cuda 12?

Aug 23 '24 18:08 mroeschke

@brandon-b-miller it appears you added the unit test in #16570 (comment) in #13650. Do you happen to know if this test would be impacted by using an older version of numba (0.57) and cuda 12?

I think this is coming from pynvjitlink having what seems to be a numba 0.58 minimum version.

https://github.com/rapidsai/pynvjitlink/blob/a2f23b7c3c237f2cdde3093c845e0453572503eb/pynvjitlink/patch.py#L11

pynvjitlink is needed for numba to launch kernels in cuda 12 environments where MVC is required. Based on these observations alone I suppose the failure is expected for this configuration. I will give this a deeper dive to make sure I'm understanding the situation correctly.

Aug 23 '24 18:08 brandon-b-miller

Looks like with this new configuration, any ad-hoc job using test_python_common dependency will use the oldest dependencies by default e.g unit-tests-cudf-pandas, pandas-tests, wheel-tests-cudf-polars, wheel-tests-cudf-dask

Shouldn't we use the latest dependencies by default?

Specifically, the pandas-tests job clone the pandas unit tests from the version 2.2 tests but this job installs pandas 2.0, so there may be some incompatibility that's causing this job to time out

Sep 03 '24 22:09 mroeschke

The oldest dependency usage is entirely governed by the logic in the various ci/* scripts, which are reading the value of RAPIDS_DEPENDENCIES and setting up a constraint accordingly. If there are jobs that should not respect that and should always run with the latest deps (and you're right, I think there are) then those scripts should remove that logic entirely.

Sep 04 '24 00:09 vyasr

Do other have thoughts whether the wheel-tests-cudf-polars or wheel-tests-cudf-dask jobs should just run with the oldest dependencies or should they run with the latest dependencies?

Sep 04 '24 17:09 mroeschke

I think it's sufficient for just the one job running the main cudf tests to be run with oldest dependencies for now. We'll probably want to do something similar for polars once we start supporting a range. For dask, I don't think we'll ever be able to support a range anyway; we always pin tightly.

Sep 04 '24 22:09 vyasr

/merge

Sep 04 '24 23:09 mroeschke