dask-ml
dask-ml copied to clipboard
Avoid compute in datasets
https://github.com/dask/dask-ml/blob/d5801584d092d8f13f1b38aaf4da5dc3caa6a213/dask_ml/datasets.py#L332 isn't great, especially in settings like Hyperband #221, that are using the distributed scheduler.
We could probably replace
rng = dask_ml.utils.check_random_state(random_state)
with
rng = sklearn.utils.check_random_state(random_state)
and draw
informative_idx- random data to seed the
dask.array.RandomStatethat is eventually used to generate the large random data.
Is this still open? I used dask almost a year ago and I would like to contribute.
Yes, I think so. There may be a draw_seed in utils may help.
On May 25, 2019, at 03:50, dma092 [email protected] wrote:
Is this still open?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
Can I work on it?
That would be a great. The docs have contributing guidelines.
On May 26, 2019, at 14:26, dma092 [email protected] wrote:
Can I work on it?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
The docs have contributing guidelines.
https://ml.dask.org/contributing.html
It seems to me that all the tests in test_datsets passes after just commenting
informative_idx, beta = dask.compute(informative_idx, beta) .
What do you think?
If all the tests pass then that should be fine.
If all the tests pass then that should be fine.
Are you talking about only the tests in test_datasets.py?
I meant the entire test suite, since other tests use it.