dask-ml icon indicating copy to clipboard operation
dask-ml copied to clipboard

Documentation Issue with train_test_split and blockwise

Open christhorn2 opened this issue 1 year ago • 1 comments

Describe the issue:

API Documentation of dask train_test_split states that blockwise=False is supported for Arrays: "For Dask Arrays, set blockwise=False to shuffle data between blocks as well." https://ml.dask.org/modules/generated/dask_ml.model_selection.train_test_split.html#dask_ml.model_selection.train_test_split

This is the intention of the code too I think, and it delegates the job to ShuffleSplit: https://github.com/dask/dask-ml/blob/567cfd7837c7616fd352e0efbcfcee42f351199c/dask_ml/model_selection/_split.py#L490

However, ShuffleSplit does not support blockwise=False:

https://github.com/dask/dask-ml/blob/567cfd7837c7616fd352e0efbcfcee42f351199c/dask_ml/model_selection/_split.py#L194

Minimal Complete Verifiable Example:

from dask_ml.model_selection import train_test_split import dask.array as da x = da.arange(8, chunks=4) train_test_split(x,blockwise=false) .... NotImplementedError: ShuffleSplit with blockwise=False has not been implemented yet.

Environment:

  • Dask version: 2024.4.4
  • Python version: 3.9.18
  • Operating System:
  • Install method (conda, pip, source): pip

christhorn2 avatar Aug 15 '24 21:08 christhorn2

hey @christhorn2 , can i work on this issue?

sameeksha-sunilkumar avatar Oct 11 '24 10:10 sameeksha-sunilkumar