Patrick Hoefler

Results 345 comments of Patrick Hoefler

This hasn't been added but it's a lot easier now with dask-expr and the new DataFrame implementation

Contributions are welcome in this area!

Can you create something with a LocalCluster instead and something that we can copy-paste?

Please use a LocalCluster for a reproducer, we don't want to install docker compose to reproduce something like this

Thanks for your report. This seems to be a pandas problem and the way pandas handles the values object for Arrow Strings. FWIW, I would recommend to merge if your...

Hi, thanks for your report. We are applying the filters when converting the dataset fragment to an arrow table. So this happens before we convert things to a DataFrame and...

So I looked into this again and the filters are attached to the read parquet expression in both cases (sorry that pprint doesn't show it). I am closing this for...

Do you really need the full DataFrame back? Folks are normally writing to storage if the result is large instead of asking for everything back to the client

You shouldn't call compute on these large data frames, keep the data on the cluster and do your work there