make-surface
make-surface copied to clipboard
Integrate Natural Breaks (Jenks)
So far make-surface allows for equal interval, quantile breaks, any hybrid of the previous two, or manual classification for vectorization. Integrate a natural breaks classifier using random raster samples (for speed).
cc: @heyitsgarrett @andreasviglakis
+1
@tmcw wrote up a great post a couple years ago describing a literate jenks classifier that I have used all over the place. The implementation is included in simple-statistics.
@morganherlocker Yes, I saw that, which is what led me to the python implementation that I'd been experimenting with (which was hecka slow).
Was thinking of two paths: either integrating @tmcw's implementation, or attempting to speed up the above implementation (it does not make any use of numpy ndarrays, which I would guess would be faster).
@dnomadb Holy cow, that is absolutely the slowest possible Python code ever. Stuff like https://gist.github.com/drewda/1299198#file-gistfile1-py-L15-L16 makes me head-desk! You don't even need to go the Numpy/Cython route, just replace every
for i in range(number):
x.append(val)
with x = list(val for i in range(number)). Bang! 10X speedup. And if that's not enough, we can Cythonize stuff and get C speed.
As per @morganherlocker's suggestion, I converted @tmcw's implementation in simple-statistics over to python. As part of this, I tried to add speed increases where there was low hanging fruit: replacing all list.append()s with preset arrays, etc.
https://github.com/mapbox/make-surface/blob/classify-modularize/makesurface/scripts/classifiers.py
I was never really able to get the time to do it, but there's an improved and simplified version recently invented: http://journal.r-project.org/archive/2011-2/RJournal_2011-2_Wang+Song.pdf
(also... I found that quantiles tend to be very good relative to jenks and simpler to implement and understand so they're my go-to for non-predictable distributions)
Awesome. I have equal interval and quantile already, and added the ability to make a weighted hybrid between the two. Also am thinking I will integrate harmonic means (name?), where you iteratively split at the mean, them split those splits at their means, and so on.