cuspatial icon indicating copy to clipboard operation
cuspatial copied to clipboard

[DISCUSSION] Standardize API inputs with `GeoSeries`

Open isVoid opened this issue 3 years ago • 6 comments

Today the python computation API accepts raw structure of arrays input. Such as: https://github.com/rapidsai/cuspatial/blob/da1cb97a4df308b9d27566e4c9d9736ca06a9582/python/cuspatial/cuspatial/core/gis.py#L29 and https://github.com/rapidsai/cuspatial/blob/da1cb97a4df308b9d27566e4c9d9736ca06a9582/python/cuspatial/cuspatial/core/gis.py#L166-L173 The history of this input interface is that cuspatial was used to accelerate specific workflow where developers are very familiar with the underlying data structure of the geometry and used them. General GIS users are not required to know how the data layout before working with GIS functions, and @thomcom and I believe that using GeoSeries as the abstraction layer between user and underlying APIs is important.

cuSpatial's GeoSeries API is implemented based on GeoArrow specification: https://github.com/geopandas/geo-arrow-spec, under the hood it's an arrow UnionArray. This allows GeoSeries represent an array of pure or mixed geometry types. Currently many computation API only supports pure geometry array inputs. This falls within the range where GeoSeries is capable of supporting.

Ideally, the above APIs should look like:

def directed_hausdorff_distance(points: cuspatial.GeoSeries, space_offset: cudf.Series):
def points_in_polygons(points: cuspatial.GeoSeries, polygons: cuspatial.GeoSeries)

isVoid avatar Aug 08 '22 21:08 isVoid

Ideally, the above APIs should look like:

def directed_hausdorff_distance(points: cuspatial.GeoSeries):

Hmmm, this ignores the spaces in the Hausdorff distance API. This is a directed all-pairs distance computation. So the grouping and order of points matters.

harrism avatar Aug 09 '22 03:08 harrism

Edited PR description to address the change.

isVoid avatar Aug 09 '22 15:08 isVoid

I like this direction. API looks much cleaner. As long as we provide factories / converters from other layouts / dataframes.

harrism avatar Aug 09 '22 20:08 harrism

Thanks for creating this discussion. I agree that we should use GeoSeries in any API call that uses coordinates or offsets as much as possible.

thomcom avatar Aug 12 '22 12:08 thomcom

I propose that we create individual issues for each API that needs to be updated.

thomcom avatar Aug 12 '22 12:08 thomcom

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] avatar Sep 11 '22 13:09 github-actions[bot]

let's convert this to a milestone.

harrism avatar Oct 24 '22 21:10 harrism

I propose that we create individual issues for each API that needs to be updated.

Went with this way - this issue is now a meta issue that tracks the progress, where each API is tracked with a sub issue in the task list.

isVoid avatar Feb 14 '23 00:02 isVoid

This is done.

isVoid avatar Feb 24 '23 17:02 isVoid