dask-geopandas icon indicating copy to clipboard operation
dask-geopandas copied to clipboard

ENH: minimal support for dask.dataframe query planning (dask-expr)

Open jorisvandenbossche opened this issue 11 months ago • 3 comments

Tackling https://github.com/geopandas/dask-geopandas/issues/284, for now in a minimal way, i.e. keeping as much as possible the current code based on "legacy" dask and then turn that into an expression.

For example, for sjoin/clip/file IO, we manually build up the graph to create a collection using HighLevelGraph, for now this graph is just turned into an expression with from_graph. Long term, we can turn each of those into proper expression classes.

I am not yet fully sure how to best organize the code. For now, the new file expr.py is essentially duplicating a lot of core.py. I organized the commits such that the first just does this copy, and then you can look at the second commit to see the actual changes that were needed to core.py (in expr.py) to make it work with dask-expr -> https://github.com/geopandas/dask-geopandas/commit/d28521f30ac0f37b2c0e9c6315eb3921a4896a76

The problem is that I can't do this directly, because we want to keep supporting legacy dask for a while as well (our code is actually also still using this for intermediate results)

Some other remarks:

  • There is one actual test failure that I haven't yet be able to debug/fix (split_out keyword in dissolve), this is xfailed for now
  • The spatial_partitions handling is not yet very robust. For now this just takes the same approach as in core.py to attach this to the collection. But longer term we should attach this to the underlying expression of the collection. The current version "works" (the tests are passing), as long as you only do an operation that needs them directly after an operation that sets them.
  • All the element-wise geospatial methods that we add to the classes are still done using map_partitions like before. Longer term, we should turn those into custom expressions (this can then also allow better optimizations in the expression tree)

jorisvandenbossche avatar Mar 11 '24 12:03 jorisvandenbossche