duckdb_spatial icon indicating copy to clipboard operation
duckdb_spatial copied to clipboard

Feature : ST_ClusterDBSCAN implementation

Open baptistegirault opened this issue 1 year ago • 12 comments

PostGIS ST_ClusterDBSCAN is a really powerfull function that is only available in PostGIS. This function return cluster number from a list of geometry based on DBSCAN.

https://postgis.net/docs/ST_ClusterDBSCAN.html

baptistegirault avatar Sep 26 '24 12:09 baptistegirault

Is there any plan to work on it? As well as on ST_ClusterIntersect and ST_ClusterKMeans?

VGSML avatar May 20 '25 10:05 VGSML

Yes, I'm waiting for some changes in DuckDBs sorting and partitioning code to land, after which I will try to integrate these into spatial.

Maxxen avatar May 20 '25 10:05 Maxxen

We're also hoping to use the ST_ClusterDBSCAN. @Maxxen any idea if this is something that is weeks away or are we looking at months/year out?

stephensilber avatar May 23 '25 16:05 stephensilber

Hard to say for sure but Im currently reworking distance computations in general, after which a naive in-memory DBSCAN shouldn't be too hard to implement. The larger sorting and partitioning changes required to make it really performant are probably not going to be in place until next major release.

Maxxen avatar May 23 '25 19:05 Maxxen

ST_ClusterDBSCAN is very lacking

bochkov-vi avatar May 28 '25 08:05 bochkov-vi

Happen to have any updates here? We're looking at a migration to DuckDB but we use this function pretty heavily with PostGIS today and there's not really any other equivalent of the algorithm.

bsplosion avatar Aug 08 '25 02:08 bsplosion

Ive got ST_ClusterDBScan working from spatial's side, but it requires changes in DuckDB core in order to force the window function executor to always call the right callbacks. Im going to talk to the rest of the team to figure out how to proceed, and after that should be able to get it in for DuckDB v1.4 scheduled for September.

Maxxen avatar Aug 09 '25 22:08 Maxxen

One thing to consider here is the ability to call partition also as alot of the time you're unable to call DBScan on larger areas.

fhk avatar Aug 15 '25 23:08 fhk

Ive got ST_ClusterDBScan working from spatial's side, but it requires changes in DuckDB core in order to force the window function executor to always call the right callbacks. Im going to talk to the rest of the team to figure out how to proceed, and after that should be able to get it in for DuckDB v1.4 scheduled for September.

That would super helpful! There's no strong replacement algorithm available for us so having it in place on the next major release is exciting.

@fhk would there be a reason you can't call DBSCAN on larger areas? It does have density/distance function contraints, of course, but we haven't really seen any issues with other implementations of DBSCAN, even when it comes to large datasets. It should be able to run in O(n log n) time if implemented properly.

bsplosion avatar Aug 16 '25 15:08 bsplosion

Ah ! Depends on your usecase and what you mean by big.

Alot of the time you want to cluster more than a million points.

My current method of choice for this is to use BigQuery or cuml.

Specifically running a whole US state or even a Country.

Typically there will be too much GEOMETRY to fit it all in memory. So people work around that by either pre partitioning the data or doing it in the query like so - https://stackoverflow.com/questions/68811457/clustering-using-dbscan-in-bigquery

fhk avatar Aug 16 '25 15:08 fhk

@fhk interesting, I wonder if DuckDB's higher memory limits may help in this case - seems like BigQuery hits issues after 1GB in a partition.

I couldn't find any DuckDB documentation about any limits on per-partition memory, so this could still be a factor if it couldn't use all addressable space for some reason.

bsplosion avatar Aug 16 '25 15:08 bsplosion

Ive got ST_ClusterDBScan working from spatial's side, but it requires changes in DuckDB core in order to force the window function executor to always call the right callbacks. Im going to talk to the rest of the team to figure out how to proceed, and after that should be able to get it in for DuckDB v1.4 scheduled for September.

Hi @Maxxen , any chance this actually landed in 1.4? I see that version was released today but this issue is still marked as open, so I’m guessing it didn’t make the cut.

bsplosion avatar Sep 16 '25 18:09 bsplosion