dask-ml icon indicating copy to clipboard operation
dask-ml copied to clipboard

`TypeError` when predicting non-array data with `dask-expr`

Open aazuspan opened this issue 11 months ago • 1 comments

Describe the issue:

Attempting to predict non-array data with a ParallelPostFit estimator raises a TypeError if dask-expr is enabled. It looks like dask-ml was relying on a type check against dask.dataframe._Frame to identify dataframes, but with the 2024.3.0 release of dask, that's no longer implemented by default.

Minimal Complete Verifiable Example:

from sklearn.ensemble import RandomForestRegressor
from sklearn.datasets import make_regression
from dask_ml.wrappers import ParallelPostFit
import dask.dataframe as dd


X, y = make_regression()

est = ParallelPostFit(RandomForestRegressor()).fit(X, y)
est.predict(dd.from_array(X))

raises...

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/... Cell 3 line 3
      1 X, y = make_regression()
      2 est = ParallelPostFit(RandomForestRegressor()).fit(X, y)
----> 3 est.predict(dd.from_array(X))

File ~/.../python3.10/site-packages/dask_ml/wrappers.py:327, in ParallelPostFit.predict(self, X)
    322     result = X.map_blocks(
    323         _predict, estimator=self._postfit_estimator, drop_axis=1, meta=meta
    324     )
    325     return result
--> 327 elif isinstance(X, dd._Frame):
    328     if meta is None:
    329         meta = dd.core.no_default

TypeError: isinstance() arg 2 must be a type, a tuple of types, or a union

Anything else we need to know?:

Thanks!

Environment:

  • Dask version: 2024.3.1
  • Python version: 3.10.13
  • Operating System: PopOS
  • Install method (conda, pip, source): pip

aazuspan avatar Mar 19 '24 07:03 aazuspan

Thanks for the report. @milesgranger are you able to take a look?

TomAugspurger avatar Mar 19 '24 13:03 TomAugspurger

Looks like we need a release, that bit of code has been updated as part of the dask-expr integration in https://github.com/dask/dask-ml/pull/980: https://github.com/dask/dask-ml/blob/b3954e9ee1f7d7dee5ebdd5c4ca1b84f4dd96797/dask_ml/wrappers.py#L328

milesgranger avatar Mar 20 '24 09:03 milesgranger

Thanks for confirming. I can cut a release now.

TomAugspurger avatar Mar 20 '24 12:03 TomAugspurger

https://github.com/dask/dask-ml/actions/runs/8359352429 should be the job that pushes to PyPI

TomAugspurger avatar Mar 20 '24 12:03 TomAugspurger

Super, thanks @TomAugspurger! @aazuspan, after updating dask-ml, I'll let you close this if it fixes this issue. Thanks!

milesgranger avatar Mar 20 '24 13:03 milesgranger

Works as expected, thanks for the quick release!

aazuspan avatar Mar 20 '24 17:03 aazuspan