Patrick Hoefler
Patrick Hoefler
> So you're saying that the performance of tasks comparable to disk on larger-than-memory datasets. Is that correct? What I am saying is that this is not something we should...
Do you have time to take a look?
@dask/gpu
Thanks! Could you created your dataframe in a way that we can reproduce this?
You don't have to check for performance, you can look at the graph and check that we don't have any rechunks in there for example
This is expected. We are triggering a shuffle under the hood to avoid overloading a single worker. And a shuffle won't preserve the input order. I'll label this as a...
Investigations are welcome
Any chance you could try with the newest dask release? The MCVE doesn't warn for me anymore
Thanks for your report. Any advice on how we can make this work without adding scipy as a dependency for bags?