turf icon indicating copy to clipboard operation
turf copied to clipboard

Segmentation fault when using clustersDbscan

Open Felankia opened this issue 11 months ago • 7 comments

Version: 7.2.0

Code:

let points = [] // A huge point set. About one million GPS points.
points.splice(72026); //  The threshold is 72026
let clustered = turf.clustersDbscan(turf.featureCollection(points), 0.1);

Stacktrace:

Segmentation fault (core dumped)

And I can't find any dumpfile in jsfile folder

Felankia avatar Jan 16 '25 11:01 Felankia

I ran more test. The threshold is not a deterministic value. It somewhere between 72026 to 75000. If less than 72026, it works every time. If more than 75000, it crash every time.

Felankia avatar Jan 17 '25 01:01 Felankia

I think you'll need to include a lot of information about your running environment.

stevage avatar Jan 17 '25 03:01 stevage

I think you'll need to include a lot of information about your running environment.

CPU: i7-1165g7 Mem: 16G OS: Ubuntu 20.04.5 LTS nodejs: 14.21.2 npm: 6.14.17

Felankia avatar Jan 18 '25 05:01 Felankia

Had a go at reproducing this and was able to run 100,000 points ok. 500,000 hadn't returned after about 20 minutes, though hadn't crashed either. This was on a garden variety Mac air.

Wondering if you would mind commenting @mourner? Should rbush be able to handle an arbitrary number of points? Or will there be a limit for each system where memory is exhausted?

smallsaucepan avatar May 15 '25 12:05 smallsaucepan

@smallsaucepan perhaps RBush isn't the best library to use in this case, because it's meant for dynamic cases where you have to add or remove points often. For static cases where you index a bunch of points once, I'd look at KDBush (and geokdbush to support geographic points). I implemented a variation of DBSCAN in https://github.com/mapbox/dobbyscan which should process 500k without an issue — check it out.

mourner avatar May 15 '25 12:05 mourner

Fantastic! Thank for the leads @mourner. Will take a look.

smallsaucepan avatar May 17 '25 02:05 smallsaucepan

@Felankia have seen some promising results following one of the suggestions above. Previous response times in the below graph are blue, using geokdbush instead gives us the red line:

Image

Working on a PR now to use the faster library.

smallsaucepan avatar May 17 '25 11:05 smallsaucepan