Segmentation fault when using clustersDbscan
Version: 7.2.0
Code:
let points = [] // A huge point set. About one million GPS points.
points.splice(72026); // The threshold is 72026
let clustered = turf.clustersDbscan(turf.featureCollection(points), 0.1);
Stacktrace:
Segmentation fault (core dumped)
And I can't find any dumpfile in jsfile folder
I ran more test. The threshold is not a deterministic value. It somewhere between 72026 to 75000. If less than 72026, it works every time. If more than 75000, it crash every time.
I think you'll need to include a lot of information about your running environment.
I think you'll need to include a lot of information about your running environment.
CPU: i7-1165g7 Mem: 16G OS: Ubuntu 20.04.5 LTS nodejs: 14.21.2 npm: 6.14.17
Had a go at reproducing this and was able to run 100,000 points ok. 500,000 hadn't returned after about 20 minutes, though hadn't crashed either. This was on a garden variety Mac air.
Wondering if you would mind commenting @mourner? Should rbush be able to handle an arbitrary number of points? Or will there be a limit for each system where memory is exhausted?
@smallsaucepan perhaps RBush isn't the best library to use in this case, because it's meant for dynamic cases where you have to add or remove points often. For static cases where you index a bunch of points once, I'd look at KDBush (and geokdbush to support geographic points). I implemented a variation of DBSCAN in https://github.com/mapbox/dobbyscan which should process 500k without an issue — check it out.
Fantastic! Thank for the leads @mourner. Will take a look.
@Felankia have seen some promising results following one of the suggestions above. Previous response times in the below graph are blue, using geokdbush instead gives us the red line:
Working on a PR now to use the faster library.