Patrick Hoefler
Patrick Hoefler
This hasn't been added but it's a lot easier now with dask-expr and the new DataFrame implementation
Contributions are welcome in this area!
Hi, could you provide a reproducer?
Can you create something with a LocalCluster instead and something that we can copy-paste?
Please use a LocalCluster for a reproducer, we don't want to install docker compose to reproduce something like this
Thanks for your report. This seems to be a pandas problem and the way pandas handles the values object for Arrow Strings. FWIW, I would recommend to merge if your...
Hi, thanks for your report. We are applying the filters when converting the dataset fragment to an arrow table. So this happens before we convert things to a DataFrame and...
So I looked into this again and the filters are attached to the read parquet expression in both cases (sorry that pprint doesn't show it). I am closing this for...
Do you really need the full DataFrame back? Folks are normally writing to storage if the result is large instead of asking for everything back to the client
You shouldn't call compute on these large data frames, keep the data on the cluster and do your work there