dask-geopandas
dask-geopandas copied to clipboard
GeoPandas feature parity
Quickly exploring which spatial methods that are defined in GeoPandas are not yet available here:
import pandas as pd
import geopandas
import dask.dataframe as dd
import dask_geopandas
methods_pandas = set([n for n in dir(pd.DataFrame) if not n.startswith("_")])
methods_geopandas = set([n for n in dir(geopandas.GeoDataFrame) if not n.startswith("_")])
methods_dask = set([n for n in dir(dd.DataFrame) if not n.startswith("_")])
methods_dask_geopandas = set([n for n in dir(dask_geopandas.GeoDataFrame) if not n.startswith("_")])
methods_geopandas_extra = methods_geopandas - methods_pandas
methods_dask_geopandas_extra = methods_dask_geopandas - methods_dask
>>> methods_geopandas_extra - methods_dask_geopandas_extra
{'cascaded_union',
'covered_by',
'covers',
'estimate_utm_crs',
'explore',
'from_features',
'from_file',
'from_postgis',
'has_sindex',
'iterfeatures',
'overlay',
'rename_geometry',
'sjoin',
'sjoin_nearest',
'to_file',
'to_postgis',
'to_wkb',
'to_wkt'}
>>> methods_dask_geopandas_extra - methods_geopandas_extra
{'calculate_spatial_partitions',
'hilbert_distance',
'interpolate',
'morton_distance',
'set_geometry',
'to_dask_dataframe'}
Some quick first notes:
coversandcovered_byare 2 missing predicates that should be trivial to add heresjoinwas added as a method in geopandas, we should do the same here
One more thing to this is also API parity, e.g. sjoin in geopandas now uses predicate while here we still have op only.
An updated version:
>>> methods_geopandas_extra - methods_dask_geopandas_extra
{'cascaded_union',
'clip_by_rect',
'estimate_utm_crs',
'explore',
'from_features',
'from_file',
'from_postgis',
'has_sindex',
'iterfeatures',
'overlay',
'sjoin_nearest',
'to_file',
I think from this list, overlay and sjoin_nearest are the most useful (but also complicated).
clip_by_rect is probably easy to add (since it's an element-wise operation, similarly to intersection).
to_file depends on how we want to deal with writing multiple partitions (write multiple files? Or append to a single file, but this produces a serial bottleneck in the graph)
explore should probably use datashader as spatial pandas does. Certainly not dumping data to leaflet.
Big :+1: to overlay. Is it significantly more complicated than sjoin?
Is it significantly more complicated than sjoin?
I am afraid so. See https://github.com/geopandas/dask-geopandas/pull/217#issuecomment-1229241629
👍 to explore . Is this on the roadmap?
@alejohz I will tentatively say yes with a note that it will be a datashader-based method using holoviz ecosystem most likely, so a bit different than explore in vanilla GeoPandas.
I am wondering if overlay has been actively working on here.
@Geoyi I am not aware of that. It is on the roadmap but the priority of the team currently lies in the main GeoPandas project and adjacent so I don't think there's an active development of overlay at this moment. Anyone can pick it up if interested.