dask-ml
dask-ml copied to clipboard
Incremental wrapper fails for IncrementalPCA
What happened:
When calling the fit() function on the Incremental wrapper with IncrementalPCA, the following error gets thrown:
AttributeError: 'numpy.ndarray' object has no attribute 'chunks'. It seems like the Dask Array is internally converted to a Numpy array, which is wrong. I also looked at the scoring parameter, but it is not applicable for PCA and should not cause any issues during fit.
What you expected to happen: The Incremental wrapper should not convert the dask array to a Numpy array internally.
Minimal Complete Verifiable Example:
from dask_ml.decomposition import IncrementalPCA
from dask_ml.wrappers import Incremental
X, _ = make_classification(n_samples=100000, n_features=100, chunks=10000)
pca = IncrementalPCA(n_components=8, batch_size=40000)
inc = Incremental(pca)
inc.partial_fit(X)
pca.partial_fit(X) # This works
Environment:
- Dask version: 2022.1
- Python version: 3.9
- Operating System: Ubuntu
- Install method (conda, pip, source): Conda
I'm curious: why are you using Incremental and IncrementalPCA together? I think that decomposition.IncrementalPCA expects a Dask Array. But Incremental feeds through chunks of Dask Arrays to the underlying estimator.