Patrick Hoefler

Results 345 comments of Patrick Hoefler

> So you're saying that the performance of tasks comparable to disk on larger-than-memory datasets. Is that correct? What I am saying is that this is not something we should...

Thanks! Could you created your dataframe in a way that we can reproduce this?

You don't have to check for performance, you can look at the graph and check that we don't have any rechunks in there for example

This is expected. We are triggering a shuffle under the hood to avoid overloading a single worker. And a shuffle won't preserve the input order. I'll label this as a...

Investigations are welcome

Any chance you could try with the newest dask release? The MCVE doesn't warn for me anymore

Thanks for your report. Any advice on how we can make this work without adding scipy as a dependency for bags?