ST_DBSCAN error
- ddl & dml ` create table geo_poi ( id int, geom geometry );
insert into geo_poi values ( 1, ST_GeomFromText('POINT(124.406 39.908)') ), ( 2, ST_GeomFromText('POINT(124.397 39.910)') ), ( 3, ST_GeomFromText('POINT(124.397 39.905)') );
`
- api doc ?
select st_dbscan(geom, 800, 2), id from geo_poi;
Error running query: java.lang.IllegalArgumentException: function ST_DBSCAN takes at least 4 argument(s), 3 argument(s) specified
- checkpoint error
select st_dbscan(geom, 800, 2, true), id from geo_poi;Error running query: [_LEGACY_ERROR_TEMP_3016] org.apache.spark.SparkException: Checkpoint directory has not been set in the SparkContext
fix like this:
sparkSession.sparkContext.setCheckpointDir("file:///tmp/checkpoints/")
how about https://github.com/databrickslabs/geoscan dbscan algorithm?
doc fix: https://github.com/apache/sedona/pull/1870
how about https://github.com/databrickslabs/geoscan dbscan algorithm?
This is not DBSCAN strictly. I wanted to integrate a true DBSCAN first into Sedona.
I'm interested in integrating some algorithms that will be more performant by avoiding the connected components calculation but I dont have the bandwidth at this time.
@MrPowers We should probably add the checkpoint warning to this page and any others that describe using DBSCAN. In the meanwhile I will see if we can avoid it all together, perhaps using the Graphx implementation of connected components in Graphframes lib.