dask-ml CI Fixups

numpy 2 combat
changed error messgae

Jul 20 '24 22:07 TomAugspurger

There's still one error I haven't been able to fix:

================================================================================ test session starts ================================================================================
platform darwin -- Python 3.11.0, pytest-8.3.1, pluggy-1.5.0
rootdir: /Users/tom/gh/dask/dask-ml
configfile: pyproject.toml
plugins: cov-5.0.0, mock-3.14.0
collected 1 item

tests/test_incremental_pca.py F                                                                                                                                               [100%]

===================================================================================== FAILURES ======================================================================================
_______________________________________________________________________________ test_whitening[auto] ________________________________________________________________________________

svd_solver = 'auto'

    @pytest.mark.parametrize("svd_solver", ["full", "auto", "randomized"])
    @pytest.mark.filterwarnings("ignore:invalid value:RuntimeWarning")
    def test_whitening(svd_solver):
        # Test that PCA and IncrementalPCA transforms match to sign flip.
        X = datasets.make_low_rank_matrix(
            1000, 10, tail_strength=0.0, effective_rank=2, random_state=1999
        )
        X = da.from_array(X, chunks=[200, -1])
        prec = 3
        n_samples, n_features = X.shape
        for nc in [None, 9]:
            pca = PCA(whiten=True, n_components=nc, svd_solver=svd_solver).fit(X.compute())
            ipca = IncrementalPCA(
                whiten=True, n_components=nc, batch_size=250, svd_solver=svd_solver
            ).fit(X)

            Xt_pca = pca.transform(X)
            Xt_ipca = ipca.transform(X)
>           assert_almost_equal(np.abs(Xt_pca), np.abs(Xt_ipca), decimal=prec)

tests/test_incremental_pca.py:454:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../../mambaforge/envs/python=3.11/lib/python3.11/contextlib.py:81: in inner
    return func(*args, **kwds)
../../../mambaforge/envs/python=3.11/lib/python3.11/contextlib.py:81: in inner
    return func(*args, **kwds)
.direnv/python-3.11/lib/python3.11/site-packages/numpy/_utils/__init__.py:85: in wrapper
    return fun(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

args = (<function assert_array_almost_equal.<locals>.compare at 0x123d02160>, array([[3.46374689e-01, 6.42854227e-01, 1.28803...2.04242514e+05]]), dask.array<absolute, shape=(1000, 10), dtype=float64, chunksize=(200, 10), chunktype=numpy.ndarray>)
kwds = {'err_msg': '', 'header': 'Arrays are not almost equal to 3 decimals', 'precision': 3, 'verbose': True}

    @wraps(func)
    def inner(*args, **kwds):
        with self._recreate_cm():
>           return func(*args, **kwds)
E           AssertionError:
E           Arrays are not almost equal to 3 decimals
E
E           Mismatched elements: 1430 / 10000 (14.3%)
E           Max absolute difference among violations: 874440.31622524
E           Max relative difference among violations: 14845029.47333545
E            ACTUAL: array([[3.464e-01, 6.429e-01, 1.288e+00, ..., 8.527e-01, 4.654e-01,
E                   2.602e+05],
E                  [9.195e-02, 6.557e-01, 1.029e+00, ..., 8.861e-01, 3.697e-01,...
E            DESIRED: array([[0.346, 0.643, 1.288, ..., 0.853, 0.464, 1.238],
E                  [0.092, 0.656, 1.029, ..., 0.886, 0.369, 0.19 ],
E                  [0.092, 1.329, 1.784, ..., 0.104, 0.395, 0.606],...

../../../mambaforge/envs/python=3.11/lib/python3.11/contextlib.py:81: AssertionError
============================================================================== short test summary info ==============================================================================
FAILED tests/test_incremental_pca.py::test_whitening[auto] - AssertionError:
================================================================================= 1 failed in 0.42s =================================================================================

The only thing I've found so far are that the components_ are different when whiten=True

Jul 20 '24 22:07 TomAugspurger

cc @fujiisoup in case you have a chance to look (no worries if not)

Jul 21 '24 15:07 TomAugspurger

Hi @TomAugspurger

Do you know when the test starts failing? This PR does not seem relevant.

Jul 21 '24 17:07 fujiisoup

I tried an investigation, and seems like an upstream issue. Rose an issue (there)[https://github.com/scikit-learn/scikit-learn/issues/29534].

With numpy==2.0, it seems like that sklearn.decomposition.PCA is unstable, sometimes giving strange values.

Jul 22 '24 00:07 fujiisoup

Thanks for looking into it. I've subscribed to the upstream issue in scikit-learn and will skip or adjust this test as needed with NumPy 2.0.

Jul 22 '24 12:07 TomAugspurger

I'm going to merge these changes and deal with the other failures later.

Nov 25 '24 14:11 TomAugspurger