make-surface icon indicating copy to clipboard operation
make-surface copied to clipboard

Integrate Natural Breaks (Jenks)

Open dnomadb opened this issue 11 years ago • 7 comments
trafficstars

So far make-surface allows for equal interval, quantile breaks, any hybrid of the previous two, or manual classification for vectorization. Integrate a natural breaks classifier using random raster samples (for speed).

cc: @heyitsgarrett @andreasviglakis

dnomadb avatar Nov 07 '14 18:11 dnomadb

+1

ian29 avatar Nov 07 '14 21:11 ian29

@tmcw wrote up a great post a couple years ago describing a literate jenks classifier that I have used all over the place. The implementation is included in simple-statistics.

morganherlocker avatar Nov 07 '14 23:11 morganherlocker

@morganherlocker Yes, I saw that, which is what led me to the python implementation that I'd been experimenting with (which was hecka slow).

Was thinking of two paths: either integrating @tmcw's implementation, or attempting to speed up the above implementation (it does not make any use of numpy ndarrays, which I would guess would be faster).

dnomadb avatar Nov 07 '14 23:11 dnomadb

@dnomadb Holy cow, that is absolutely the slowest possible Python code ever. Stuff like https://gist.github.com/drewda/1299198#file-gistfile1-py-L15-L16 makes me head-desk! You don't even need to go the Numpy/Cython route, just replace every

for i in range(number):
    x.append(val)

with x = list(val for i in range(number)). Bang! 10X speedup. And if that's not enough, we can Cythonize stuff and get C speed.

sgillies avatar Nov 07 '14 23:11 sgillies

As per @morganherlocker's suggestion, I converted @tmcw's implementation in simple-statistics over to python. As part of this, I tried to add speed increases where there was low hanging fruit: replacing all list.append()s with preset arrays, etc.

https://github.com/mapbox/make-surface/blob/classify-modularize/makesurface/scripts/classifiers.py

dnomadb avatar Nov 14 '14 20:11 dnomadb

I was never really able to get the time to do it, but there's an improved and simplified version recently invented: http://journal.r-project.org/archive/2011-2/RJournal_2011-2_Wang+Song.pdf

(also... I found that quantiles tend to be very good relative to jenks and simpler to implement and understand so they're my go-to for non-predictable distributions)

tmcw avatar Nov 14 '14 20:11 tmcw

Awesome. I have equal interval and quantile already, and added the ability to make a weighted hybrid between the two. Also am thinking I will integrate harmonic means (name?), where you iteratively split at the mean, them split those splits at their means, and so on.

dnomadb avatar Nov 14 '14 20:11 dnomadb