dask-geopandas icon indicating copy to clipboard operation
dask-geopandas copied to clipboard

GeoPandas feature parity

Open jorisvandenbossche opened this issue 3 years ago • 9 comments
trafficstars

Quickly exploring which spatial methods that are defined in GeoPandas are not yet available here:

import pandas as pd
import geopandas
import dask.dataframe as dd
import dask_geopandas

methods_pandas = set([n for n in dir(pd.DataFrame) if not n.startswith("_")])
methods_geopandas = set([n for n in dir(geopandas.GeoDataFrame) if not n.startswith("_")])
methods_dask = set([n for n in dir(dd.DataFrame) if not n.startswith("_")])
methods_dask_geopandas = set([n for n in dir(dask_geopandas.GeoDataFrame) if not n.startswith("_")])

methods_geopandas_extra = methods_geopandas - methods_pandas
methods_dask_geopandas_extra = methods_dask_geopandas - methods_dask

>>> methods_geopandas_extra - methods_dask_geopandas_extra
{'cascaded_union',
 'covered_by',
 'covers',
 'estimate_utm_crs',
 'explore',
 'from_features',
 'from_file',
 'from_postgis',
 'has_sindex',
 'iterfeatures',
 'overlay',
 'rename_geometry',
 'sjoin',
 'sjoin_nearest',
 'to_file',
 'to_postgis',
 'to_wkb',
 'to_wkt'}

>>> methods_dask_geopandas_extra - methods_geopandas_extra
{'calculate_spatial_partitions',
 'hilbert_distance',
 'interpolate',
 'morton_distance',
 'set_geometry',
 'to_dask_dataframe'}

Some quick first notes:

  • covers and covered_by are 2 missing predicates that should be trivial to add here
  • sjoin was added as a method in geopandas, we should do the same here

jorisvandenbossche avatar Jan 13 '22 21:01 jorisvandenbossche

One more thing to this is also API parity, e.g. sjoin in geopandas now uses predicate while here we still have op only.

martinfleis avatar Jan 24 '22 10:01 martinfleis

An updated version:

>>> methods_geopandas_extra - methods_dask_geopandas_extra
{'cascaded_union',
 'clip_by_rect',
 'estimate_utm_crs',
 'explore',
 'from_features',
 'from_file',
 'from_postgis',
 'has_sindex',
 'iterfeatures',
 'overlay',
 'sjoin_nearest',
 'to_file',

I think from this list, overlay and sjoin_nearest are the most useful (but also complicated). clip_by_rect is probably easy to add (since it's an element-wise operation, similarly to intersection). to_file depends on how we want to deal with writing multiple partitions (write multiple files? Or append to a single file, but this produces a serial bottleneck in the graph)

jorisvandenbossche avatar Apr 06 '22 08:04 jorisvandenbossche

explore should probably use datashader as spatial pandas does. Certainly not dumping data to leaflet.

martinfleis avatar Apr 06 '22 09:04 martinfleis

Big :+1: to overlay. Is it significantly more complicated than sjoin?

rabernat avatar Aug 20 '22 08:08 rabernat

Is it significantly more complicated than sjoin?

I am afraid so. See https://github.com/geopandas/dask-geopandas/pull/217#issuecomment-1229241629

martinfleis avatar Aug 27 '22 18:08 martinfleis

👍 to explore . Is this on the roadmap?

alejohz avatar Dec 07 '22 01:12 alejohz

@alejohz I will tentatively say yes with a note that it will be a datashader-based method using holoviz ecosystem most likely, so a bit different than explore in vanilla GeoPandas.

martinfleis avatar Dec 30 '22 22:12 martinfleis

I am wondering if overlay has been actively working on here.

Geoyi avatar Mar 13 '23 15:03 Geoyi

@Geoyi I am not aware of that. It is on the roadmap but the priority of the team currently lies in the main GeoPandas project and adjacent so I don't think there's an active development of overlay at this moment. Anyone can pick it up if interested.

martinfleis avatar Mar 13 '23 16:03 martinfleis