Tom Augspurger comments

Results 868 comments of


                                            Tom Augspurger

trafficstars

Idea: Read Grib2 directly from object storage using the Zarr library?

What are the actual work items here? I see that @martindurant has https://github.com/fsspec/kerchunk/pull/27 and @rsignell-usgs has https://nbviewer.org/gist/rsignell-usgs/ae7bfb16e8a4049bb1ca0379805c4dc2. Is there anything more to do to support grib2 through this reference filesystem...

Idea: Read Grib2 directly from object storage using the Zarr library?

Gotcha, thanks. So currently building the reference file with kerchunk requires downloading the grib file locally, but once you have the offsets you can read the bytes directly with HTTP...

No support for stratified split in dask_ml.model_selection.train_test_split

Agreed. Are you interested in working on this? On Fri, Aug 9, 2019 at 3:18 PM Karanraj Chauhan wrote: > scikit-learn implementation of train test split > (sklearn.model_selection.train_test_split) supports splitting...

No support for stratified split in dask_ml.model_selection.train_test_split

That's great if you're willing to try. Let us know if you get stuck. On Sun, Aug 11, 2019 at 6:40 PM Karanraj Chauhan wrote: > Tempted to say yes,...

No support for stratified split in dask_ml.model_selection.train_test_split

@tiagofassoni great! dask-ml's OneHotEncoder may be helpful here. It will use the Categorical dtype for pandas dataframes. Otherwise you can (or maybe need?) to pass the `categories` manually as a...

No support for stratified split in dask_ml.model_selection.train_test_split

I’m not aware of any progress. Perhaps Tiago can share a status update. > On Feb 21, 2020, at 7:37 AM, Tim Huang wrote: > > > is there...

No support for stratified split in dask_ml.model_selection.train_test_split

> random_split but I couldn't find its source code. So I'm not 100% sure how to deal with that case. That's in `dask.dataframe.DataFrame.random_split` > compute all the categories beforehand and...

No support for stratified split in dask_ml.model_selection.train_test_split

May be easiest to move to a PR. We might be able to do things lazily for dask array, we'll just probably end up with unknown chunk sizes.

AttributeError when using dask_ml.model_selection.kfold object

Can you post the full traceback?

AttributeError when using dask_ml.model_selection.kfold object

Thanks. You might want to try using the single-threaded scheduler (at least not using the distributed schedule) which should give cleaner tracebacks. I’m trying to narrow down where things are...