Expose clustering functions to C API
This is a enhancement reminder request to expose clustering functions from #688 to the C API.
Following the previous discussion, there are several strategies to implement this, whether it is via some form of GeometryCollection or array(s) of geometries and/or cluster IDs, etc.
Any thoughts on an API that would work well for shapely?
Generally, arrays of ints representing cluster IDs would probably work best. Shapely is now array-oriented via the NumPy C-API.
I'm guessing the user would first collect their input geometry array:
input = GEOSGeom_createCollection(GEOS_GEOMETRYCOLLECTION, geoms, n);
perhaps an alternative non-owning variant of GeometryCollection, e.g. GeometryArray could be used instead:
input = GEOSGeometryArray_create(geoms, n);
After input geometries are collected, then use a new function that could be re-used by a few different clustering methods with optional distance parameter:
extern int GEOS_DLL *GEOSClusterFinder_create(
const GEOSGeometry* input,
const int method, /* e.g. DBSCAN, ClusterWithin, or other method enumerated by `enum GEOSClusterMethods` */
const double distance, /* if needed by method, otherwise ignore */
int* clusterIds);
where clusterIds is an array the same size as the input geometry array. As for ID values, perhaps reserve 0 for "no cluster assigned", otherwise positive 1, 2, 3,...
I'm not sure about the ownership of clusterIds, but it could be created by the users, since it has a known size. A follow-up GEOSClusterFinder_destroy() would be expected at the end (and possibly GEOSGeometryArray_destroy() depending on how input is done).
@caspervdw and @jorisvandenbossche might have thoughts on the best approach too.
+1 for this.
The propose mechanism of providing input geometries and returning cluster information sounds in the right direction. I'd suggest just having different functions for each clustering method, using the same calling pattern. This allows different parameter(s) for clustering methods, if required. Also makes documentation more straightforward.
Just to be clear, there is currently no such structure as GEOSGeometryArray. But such a structure would have sme advantages over GeometryCollection:
- it can allow null or empty elements
- perhaps it's implementation could be simpler?
I've been working on functions that operate on Simple Polygonal Coverages, and this kind of structure would be useful to define a C API for them as well.