FlowKit icon indicating copy to clipboard operation
FlowKit copied to clipboard

Implement Hartigan's leader algorithm as a C window function

Open greenape opened this issue 6 years ago • 5 comments

One of the most awesome things FlowKit does is provide an implementation of the Isaacman et al's meaningful locations algorithm. At present, the clustering portion of this, which uses Hartigan's leader algorithm, is written as a pl/pgSQL aggregate function and is pretty hairy, and very slow.

A much better fit here would be a C window function, ala PostGIS' DBSCAN and KMeans functions.

This isn't the easiest thing in the world to do, because the window-functions-in-C bit of Postgres is one of the more weakly documented areas, but would I suspect yield some big performance wins. (Might also be something worth contributing back to PostGIS.)

I'd suggest that the signature wants to be something like:

ST_Leader(point, weight, distance_threshold) -> cluster number (iirc, you can't return more than one value per row from a window function.)

We'd possibly want to pair this with a weighted centroid implementation as well, to facilitate matters, and make it easy to get the cluster centroid back after doing the assignment.

Alternatively, you can apparently write window functions using PL/R! So that's also a possibility.

greenape avatar Aug 05 '19 10:08 greenape

This post on the pg-development group has a minimal window function implementation: https://www.postgresql.org/message-id/[email protected] which is a decent reference.

greenape avatar Aug 05 '19 10:08 greenape

Also relevant, is this PR against PostGIS which added the DBSCAN window function: https://github.com/postgis/postgis/pull/86

greenape avatar Aug 05 '19 10:08 greenape

Other options on the return type might be a json blob, a composite type, or a custom type.

greenape avatar Sep 09 '19 09:09 greenape

Or maybe in rust, @Thingus?

greenape avatar Jun 24 '22 14:06 greenape

PGX (the Rust crate for talking to Postgres) only spits out json/jsonb right now, so as long as that's appropriate, it seems ideal

Thingus avatar Jun 27 '22 10:06 Thingus