dask-ml Avoid compute in datasets

trafficstars

https://github.com/dask/dask-ml/blob/d5801584d092d8f13f1b38aaf4da5dc3caa6a213/dask_ml/datasets.py#L332 isn't great, especially in settings like Hyperband #221, that are using the distributed scheduler.

We could probably replace

    rng = dask_ml.utils.check_random_state(random_state)

with

    rng = sklearn.utils.check_random_state(random_state)

and draw

informative_idx
random data to seed the dask.array.RandomState that is eventually used to generate the large random data.

Jul 03 '18 02:07 TomAugspurger

Is this still open? I used dask almost a year ago and I would like to contribute.

May 25 '19 08:05 dma092

Yes, I think so. There may be a draw_seed in utils may help.

On May 25, 2019, at 03:50, dma092 [email protected] wrote:

Is this still open?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

May 25 '19 12:05 TomAugspurger

Can I work on it?

May 26 '19 19:05 dma092

That would be a great. The docs have contributing guidelines.

On May 26, 2019, at 14:26, dma092 [email protected] wrote:

Can I work on it?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

May 26 '19 22:05 TomAugspurger

The docs have contributing guidelines.

https://ml.dask.org/contributing.html

May 27 '19 01:05 stsievert

It seems to me that all the tests in test_datsets passes after just commenting informative_idx, beta = dask.compute(informative_idx, beta) . What do you think?

Oct 13 '19 08:10 dma092

If all the tests pass then that should be fine.

Oct 14 '19 12:10 TomAugspurger

If all the tests pass then that should be fine.

Are you talking about only the tests in test_datasets.py?

Oct 14 '19 19:10 dma092

I meant the entire test suite, since other tests use it.

Oct 14 '19 19:10 TomAugspurger

dask-ml dask-ml copied to clipboard

Avoid compute in datasets

dask-ml
dask-ml copied to clipboard