dask-sql icon indicating copy to clipboard operation
dask-sql copied to clipboard

[BUG]Prediction with empty partitions fails on sklearn dask-ml models

Open VibhuJawa opened this issue 3 years ago • 5 comments

Prediction with empty partitions fails on sklearn dask-ml Models . This is because sklearn currently errors on empty frames. I am opening this issue here to track the best approach (wether its a fix that should go in dask-ml or sklearn or dask-sql.

Trace:

Exception: "ValueError('Found array with 0 sample(s) (shape=(0, 2)) while a minimum of 1 is required.')"

What happened:

%%sql
SELECT * FROM PREDICT(
  MODEL model,
  SELECT * FROM test_set limit 100
)

What you expected to happen:

Would expect this to work similar to cuML .

VibhuJawa avatar Mar 02 '22 21:03 VibhuJawa

Is this an issue that can be narrowed down to a Dask-ML reproducer? If so, I would assume a fix would make sense there as generally Dask APIs shouldn't run into issues if a dataframe contains empty partitions

charlesbluca avatar Mar 10 '22 15:03 charlesbluca

Is this an issue that can be narrowed down to a Dask-ML reproducer? If so, I would assume a fix would make sense there as generally Dask APIs shouldn't run into issues if a dataframe contains empty partitions

Yup. The hope is that i can push a fix for this in Dask-ML . If not then fallback to a fix here. Will like to keep this issue open for tracking purposes.

VibhuJawa avatar Mar 10 '22 21:03 VibhuJawa

Makes sense to me - feel free to ping this issue with any follow up discussion / PRs on dask-ml

charlesbluca avatar Mar 11 '22 14:03 charlesbluca

Started issue https://github.com/dask/dask-ml/issues/911 and PR https://github.com/dask/dask-ml/pull/912 to fix this.

VibhuJawa avatar Mar 25 '22 02:03 VibhuJawa

Can we close this issue since we've eliminated all Dask-ML dependencies?

sarahyurick avatar Dec 21 '22 19:12 sarahyurick