Documentation Issue with train_test_split and blockwise
Describe the issue:
API Documentation of dask train_test_split states that blockwise=False is supported for Arrays: "For Dask Arrays, set blockwise=False to shuffle data between blocks as well." https://ml.dask.org/modules/generated/dask_ml.model_selection.train_test_split.html#dask_ml.model_selection.train_test_split
This is the intention of the code too I think, and it delegates the job to ShuffleSplit: https://github.com/dask/dask-ml/blob/567cfd7837c7616fd352e0efbcfcee42f351199c/dask_ml/model_selection/_split.py#L490
However, ShuffleSplit does not support blockwise=False:
https://github.com/dask/dask-ml/blob/567cfd7837c7616fd352e0efbcfcee42f351199c/dask_ml/model_selection/_split.py#L194
Minimal Complete Verifiable Example:
from dask_ml.model_selection import train_test_split
import dask.array as da
x = da.arange(8, chunks=4)
train_test_split(x,blockwise=false)
....
NotImplementedError: ShuffleSplit with blockwise=False has not been implemented yet.
Environment:
- Dask version: 2024.4.4
- Python version: 3.9.18
- Operating System:
- Install method (conda, pip, source): pip
hey @christhorn2 , can i work on this issue?