CI Fixups
- numpy 2 combat
- changed error messgae
There's still one error I haven't been able to fix:
================================================================================ test session starts ================================================================================
platform darwin -- Python 3.11.0, pytest-8.3.1, pluggy-1.5.0
rootdir: /Users/tom/gh/dask/dask-ml
configfile: pyproject.toml
plugins: cov-5.0.0, mock-3.14.0
collected 1 item
tests/test_incremental_pca.py F [100%]
===================================================================================== FAILURES ======================================================================================
_______________________________________________________________________________ test_whitening[auto] ________________________________________________________________________________
svd_solver = 'auto'
@pytest.mark.parametrize("svd_solver", ["full", "auto", "randomized"])
@pytest.mark.filterwarnings("ignore:invalid value:RuntimeWarning")
def test_whitening(svd_solver):
# Test that PCA and IncrementalPCA transforms match to sign flip.
X = datasets.make_low_rank_matrix(
1000, 10, tail_strength=0.0, effective_rank=2, random_state=1999
)
X = da.from_array(X, chunks=[200, -1])
prec = 3
n_samples, n_features = X.shape
for nc in [None, 9]:
pca = PCA(whiten=True, n_components=nc, svd_solver=svd_solver).fit(X.compute())
ipca = IncrementalPCA(
whiten=True, n_components=nc, batch_size=250, svd_solver=svd_solver
).fit(X)
Xt_pca = pca.transform(X)
Xt_ipca = ipca.transform(X)
> assert_almost_equal(np.abs(Xt_pca), np.abs(Xt_ipca), decimal=prec)
tests/test_incremental_pca.py:454:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../../mambaforge/envs/python=3.11/lib/python3.11/contextlib.py:81: in inner
return func(*args, **kwds)
../../../mambaforge/envs/python=3.11/lib/python3.11/contextlib.py:81: in inner
return func(*args, **kwds)
.direnv/python-3.11/lib/python3.11/site-packages/numpy/_utils/__init__.py:85: in wrapper
return fun(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
args = (<function assert_array_almost_equal.<locals>.compare at 0x123d02160>, array([[3.46374689e-01, 6.42854227e-01, 1.28803...2.04242514e+05]]), dask.array<absolute, shape=(1000, 10), dtype=float64, chunksize=(200, 10), chunktype=numpy.ndarray>)
kwds = {'err_msg': '', 'header': 'Arrays are not almost equal to 3 decimals', 'precision': 3, 'verbose': True}
@wraps(func)
def inner(*args, **kwds):
with self._recreate_cm():
> return func(*args, **kwds)
E AssertionError:
E Arrays are not almost equal to 3 decimals
E
E Mismatched elements: 1430 / 10000 (14.3%)
E Max absolute difference among violations: 874440.31622524
E Max relative difference among violations: 14845029.47333545
E ACTUAL: array([[3.464e-01, 6.429e-01, 1.288e+00, ..., 8.527e-01, 4.654e-01,
E 2.602e+05],
E [9.195e-02, 6.557e-01, 1.029e+00, ..., 8.861e-01, 3.697e-01,...
E DESIRED: array([[0.346, 0.643, 1.288, ..., 0.853, 0.464, 1.238],
E [0.092, 0.656, 1.029, ..., 0.886, 0.369, 0.19 ],
E [0.092, 1.329, 1.784, ..., 0.104, 0.395, 0.606],...
../../../mambaforge/envs/python=3.11/lib/python3.11/contextlib.py:81: AssertionError
============================================================================== short test summary info ==============================================================================
FAILED tests/test_incremental_pca.py::test_whitening[auto] - AssertionError:
================================================================================= 1 failed in 0.42s =================================================================================
The only thing I've found so far are that the components_ are different when whiten=True
cc @fujiisoup in case you have a chance to look (no worries if not)
Hi @TomAugspurger
Do you know when the test starts failing? This PR does not seem relevant.
I tried an investigation, and seems like an upstream issue. Rose an issue (there)[https://github.com/scikit-learn/scikit-learn/issues/29534].
With numpy==2.0, it seems like that sklearn.decomposition.PCA is unstable, sometimes giving strange values.
Thanks for looking into it. I've subscribed to the upstream issue in scikit-learn and will skip or adjust this test as needed with NumPy 2.0.
I'm going to merge these changes and deal with the other failures later.