dask-sql
dask-sql copied to clipboard
[BUG]Prediction with empty partitions fails on sklearn dask-ml models
Prediction with empty partitions fails on sklearn dask-ml Models . This is because sklearn currently errors on empty frames. I am opening this issue here to track the best approach (wether its a fix that should go in dask-ml or sklearn or dask-sql.
Trace:
Exception: "ValueError('Found array with 0 sample(s) (shape=(0, 2)) while a minimum of 1 is required.')"
What happened:
%%sql
SELECT * FROM PREDICT(
MODEL model,
SELECT * FROM test_set limit 100
)
What you expected to happen:
Would expect this to work similar to cuML .
Is this an issue that can be narrowed down to a Dask-ML reproducer? If so, I would assume a fix would make sense there as generally Dask APIs shouldn't run into issues if a dataframe contains empty partitions
Is this an issue that can be narrowed down to a Dask-ML reproducer? If so, I would assume a fix would make sense there as generally Dask APIs shouldn't run into issues if a dataframe contains empty partitions
Yup. The hope is that i can push a fix for this in Dask-ML . If not then fallback to a fix here. Will like to keep this issue open for tracking purposes.
Makes sense to me - feel free to ping this issue with any follow up discussion / PRs on dask-ml
Started issue https://github.com/dask/dask-ml/issues/911 and PR https://github.com/dask/dask-ml/pull/912 to fix this.
Can we close this issue since we've eliminated all Dask-ML dependencies?