Tom Augspurger
Tom Augspurger
That would have caught the error causing the failure on staging right now :)
GitHub has a new triage role. We should decide what it takes to become a triager and document that policy here.
I'm playing with the example from https://github.com/pangeo-data/ml-workflow-examples/pull/2. See https://nbviewer.jupyter.org/gist/TomAugspurger/f23c5342bef938a120b83a11d1cae077 for the updates. On this subset, it seems like the dask + xarray overhead over h5py is about 2x. I think...
Closes #63 xref https://github.com/dask/dask-ml/pull/94
Ran into this for https://gist.github.com/TomAugspurger/30ec08cc29810b57b4cb4458828e46c9 Fixes https://github.com/dask/dask-glm/issues/13 (I think) One side issue: is it safe to assume that `X` will always be chunked *only* along the rows? Perhaps we should...
xref https://github.com/dask/dask-ml/issues/84#issuecomment-34377215
It'd be good to clarify the boundaries of dask-glm and dask-ml. My motivation is building up a set of utilities in dask-ml for working generically with dask or NumPy arrays,...
Taking Matt's idea > Are there benchmarks in Pandas that are appropriate to take? Here's a bunch from some of https://github.com/pandas-dev/pandas/tree/master/asv_bench/benchmarks All of these at least run. I need to...
What high-level areas do we want coverage in? - [x] dask.order.order - [ ] Anything from dask.core? - [x] dask.optimization? - [ ] task stealing? - [ ] scheduler throughput...
This is a sketch for some sections of documentation that should go in the README. ## What to test? Ideally, benchmarks measure how long *our project* (dask, distributed) spends doing...