dask-ml icon indicating copy to clipboard operation
dask-ml copied to clipboard

Scalable Machine Learning with Dask

Results 137 dask-ml issues
Sort by recently updated
recently updated
newest added

[Workflow Run URL](https://github.com/dask/dask-ml/actions/runs/9787067536)

upstream

scikit-learn implementation of train test split (sklearn.model_selection.train_test_split) supports splitting data according to class labels (stratified split) by using the argument `stratify`. This is especially useful when datasets have high class...

It'd be great to have a dask wrapper for permutation_test_score so that permutations of gridsearches are easier to run in dask.

This pull removes a call to `delayed` which implicitly converted Dask Arrays to NumPy. Since `Delayed` objects are hash-able, unlike Arrays, `id` is called on Dask Array to get hashable...

Hello everyone, We are facing a problem when calling dd.get_dumies (or DummyEncoder) when using Categorizer to infer the categories. The problem seems to arise when two columns have the same...

dataframe

(splitting out a request from #386 ) LogisticRegression currently only supports binary classification (the multi_class argument is ignored). This feature request is to add multi-class support!

https://github.com/dask/dask-ml/blob/d5801584d092d8f13f1b38aaf4da5dc3caa6a213/dask_ml/datasets.py#L332 isn't great, especially in settings like Hyperband #221, that are using the distributed scheduler. We could probably replace ```python rng = dask_ml.utils.check_random_state(random_state) ``` with ```python rng = sklearn.utils.check_random_state(random_state) ```...

good first issue

``` from dask_ml.compose import ColumnTransformer as dd_column_transformer from sklearn.compose import ColumnTransformer as sk_column_transformer from dask_ml.preprocessing import StandardScaler as dd_standard_scaler from sklearn.preprocessing import StandardScaler as sk_standard_scaler import dask.dataframe as dd import...

It would be nice to see an example using the Dask/XGBoost handoff for parallel training and predicting. This is a common question and so would likely have high value. It...

good first issue
Documentation

Greetings! I recently used dask to implement a distributed version of tfidf. I want to contribute to the dask project by putting it somewhere. Would this be the correct repo.?...

Algorithm