sedona ST_DBSCAN error

ddl & dml ` create table geo_poi ( id int, geom geometry );

insert into geo_poi values ( 1, ST_GeomFromText('POINT(124.406 39.908)') ), ( 2, ST_GeomFromText('POINT(124.397 39.910)') ), ( 3, ST_GeomFromText('POINT(124.397 39.905)') );

`

api doc ?

select st_dbscan(geom, 800, 2), id from geo_poi; Error running query: java.lang.IllegalArgumentException: function ST_DBSCAN takes at least 4 argument(s), 3 argument(s) specified

checkpoint error select st_dbscan(geom, 800, 2, true), id from geo_poi; Error running query: [_LEGACY_ERROR_TEMP_3016] org.apache.spark.SparkException: Checkpoint directory has not been set in the SparkContext

fix like this: sparkSession.sparkContext.setCheckpointDir("file:///tmp/checkpoints/")

Mar 19 '25 08:03 freamdx

how about https://github.com/databrickslabs/geoscan dbscan algorithm?

Mar 19 '25 08:03 freamdx

doc fix: https://github.com/apache/sedona/pull/1870

Mar 19 '25 18:03 james-willis

how about https://github.com/databrickslabs/geoscan dbscan algorithm?

This is not DBSCAN strictly. I wanted to integrate a true DBSCAN first into Sedona.

I'm interested in integrating some algorithms that will be more performant by avoiding the connected components calculation but I dont have the bandwidth at this time.

Mar 19 '25 18:03 james-willis

@MrPowers We should probably add the checkpoint warning to this page and any others that describe using DBSCAN. In the meanwhile I will see if we can avoid it all together, perhaps using the Graphx implementation of connected components in Graphframes lib.

Mar 19 '25 18:03 james-willis