dask-ml icon indicating copy to clipboard operation
dask-ml copied to clipboard

Avoid compute in datasets

Open TomAugspurger opened this issue 7 years ago • 10 comments
trafficstars

https://github.com/dask/dask-ml/blob/d5801584d092d8f13f1b38aaf4da5dc3caa6a213/dask_ml/datasets.py#L332 isn't great, especially in settings like Hyperband #221, that are using the distributed scheduler.

We could probably replace

    rng = dask_ml.utils.check_random_state(random_state)

with

    rng = sklearn.utils.check_random_state(random_state)

and draw

  1. informative_idx
  2. random data to seed the dask.array.RandomState that is eventually used to generate the large random data.

TomAugspurger avatar Jul 03 '18 02:07 TomAugspurger

Is this still open? I used dask almost a year ago and I would like to contribute.

dma092 avatar May 25 '19 08:05 dma092

Yes, I think so. There may be a draw_seed in utils may help.

On May 25, 2019, at 03:50, dma092 [email protected] wrote:

Is this still open?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

TomAugspurger avatar May 25 '19 12:05 TomAugspurger

Can I work on it?

dma092 avatar May 26 '19 19:05 dma092

That would be a great. The docs have contributing guidelines.

On May 26, 2019, at 14:26, dma092 [email protected] wrote:

Can I work on it?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

TomAugspurger avatar May 26 '19 22:05 TomAugspurger

The docs have contributing guidelines.

https://ml.dask.org/contributing.html

stsievert avatar May 27 '19 01:05 stsievert

It seems to me that all the tests in test_datsets passes after just commenting informative_idx, beta = dask.compute(informative_idx, beta) . What do you think?

dma092 avatar Oct 13 '19 08:10 dma092

If all the tests pass then that should be fine.

TomAugspurger avatar Oct 14 '19 12:10 TomAugspurger

If all the tests pass then that should be fine.

Are you talking about only the tests in test_datasets.py?

dma092 avatar Oct 14 '19 19:10 dma092

I meant the entire test suite, since other tests use it.

TomAugspurger avatar Oct 14 '19 19:10 TomAugspurger