arquero icon indicating copy to clipboard operation
arquero copied to clipboard

Thematic mapping utilities

Open ericemc3 opened this issue 4 years ago • 4 comments

An extension to op.ntile() could prove useful to encode numeric values to categories from manual breaks. Something similar to the R cut function: dens_code = cut( pop_density, breaks = c(0, 1000, 5000,20000,100000, Inf)...) or d3.scaleThreshold()

jenks() and kmeans() are also useful clustering methods, we can borrow them from the simple statistics library, but of course if they were in Arquero it would be convenient.

ericemc3 avatar Dec 28 '20 13:12 ericemc3

santoku is a very featureful package for R that tries to improve on cut. I don't think that Arquero needs this many features, but there could be some API inspiration there, in addition to d3 and other patterns in the JavaScript world.

jcmkk3 avatar Dec 28 '20 17:12 jcmkk3

I'd be happy to consider a new cut / chop / etc implementation for inclusion in Arquero. Similar to recode it might be added as a new standard op function.

As for clustering algorithms, I think those might be more fitting as extensions defined in a separate package, as discussed in #67.

jheer avatar Dec 29 '20 10:12 jheer

Great, thanks!

ericemc3 avatar Dec 29 '20 11:12 ericemc3

A simple implementation for an op.cut could just be: consider for instance breaks = [t1, t2, t3] recode x with: x ∈ [min, t1[ => 0 x ∈ [t1, t2[ => 1 x ∈ [t2, t3[ => 2 x ∈ [t3, max] => 3

ericemc3 avatar Dec 29 '20 17:12 ericemc3