Tom Augspurger comments

Results 947 comments of


                                            Tom Augspurger

Add GMM

Agreed. On Tue, Jan 9, 2018 at 10:17 AM, Matthew Rocklin wrote: > I think it would be interesting to see how Dask's cholesky factorization > behaves here, but it...

Add GMM

Thanks for sharing @remiadon. One API question around your proposed `Coreset` class. > This would return a subsample of the original dask.array as a numpy.array, along with associated weigths for...

Yes, the Transformer would also work well, but would I think require https://github.com/scikit-learn/enhancement_proposals/pull/15. I haven't read through that in a while, but I don't know how it proposes to deal...

Error trying to deserialize an object

Can you give a reproducible example? And perhaps post the full traceback? Just looking at the code, I don't see what attribute would be causing the issue.

Error trying to deserialize an object

Thanks for the update. I don’t see any reason why `mean_` should be a Dask array. It should probably be set to the concrete value with the rest in https://github.com/dask/dask-ml/blob/980b3cb84e65f5508004fa1cd767d2c1122bc581/dask_ml/decomposition/pca.py#L291-L304...

Online fit - WIP

Apologies for letting this linger @thomasgreg. We're moving further development of dask-searchcv into https://github.com/dask/dask-ml https://github.com/dask/dask-ml/pull/221 is implementing Hyperband. If you're interested in picking this up again, we could maybe reuse...

ENH: spatial partitioning of the GeoDataFrame

What's the plan for `dask_geopandas.GeoDataFrame.set_index("")`? dask.dataframe's `set_index` differs from pandas since it sorts / shuffles the data by the column being assigned to the index and then partitions the output...

Metadata spec requires that geometry `columns` has no duplicates

> How else would we be able to uniquely identify a given column than its name? DataFrames / GeoDataFrames with duplicate column names seem problematic regardless. It'd have to be...

memory usage does not reflect size on disk

This could be clearer in [the dask documentation](https://docs.dask.org/en/stable/generated/dask.dataframe.DataFrame.repartition.html), but repartition, memory usage, etc. all are measuring the size of the objects in memory. This will differ from the size on...