Tom Augspurger
Tom Augspurger
What are the actual work items here? I see that @martindurant has https://github.com/fsspec/kerchunk/pull/27 and @rsignell-usgs has https://nbviewer.org/gist/rsignell-usgs/ae7bfb16e8a4049bb1ca0379805c4dc2. Is there anything more to do to support grib2 through this reference filesystem...
Gotcha, thanks. So currently building the reference file with kerchunk requires downloading the grib file locally, but once you have the offsets you can read the bytes directly with HTTP...
Agreed. Are you interested in working on this? On Fri, Aug 9, 2019 at 3:18 PM Karanraj Chauhan wrote: > scikit-learn implementation of train test split > (sklearn.model_selection.train_test_split) supports splitting...
That's great if you're willing to try. Let us know if you get stuck. On Sun, Aug 11, 2019 at 6:40 PM Karanraj Chauhan wrote: > Tempted to say yes,...
@tiagofassoni great! dask-ml's OneHotEncoder may be helpful here. It will use the Categorical dtype for pandas dataframes. Otherwise you can (or maybe need?) to pass the `categories` manually as a...
I’m not aware of any progress. Perhaps Tiago can share a status update. > On Feb 21, 2020, at 7:37 AM, Tim Huang wrote: > > > is there...
> random_split but I couldn't find its source code. So I'm not 100% sure how to deal with that case. That's in `dask.dataframe.DataFrame.random_split` > compute all the categories beforehand and...
May be easiest to move to a PR. We might be able to do things lazily for dask array, we'll just probably end up with unknown chunk sizes.
Can you post the full traceback?
Thanks. You might want to try using the single-threaded scheduler (at least not using the distributed schedule) which should give cleaner tracebacks. I’m trying to narrow down where things are...