StatsBase.jl
StatsBase.jl copied to clipboard
bin and reduce
I very regularly want to bin some 2D data based on the first element and then apply a grouping function to the 2nd element.
I wrote this for 2D case, but I have similar functions for the 3D case (bin on x,y and apply function to z)
"""
`binned(x,y,bins, μ, Ε)`
Takes vectors `x,y` of equal length than bins `x` according to `bins`. Apply `µ,E` (often mean) to `y` grouped by bins of `x`.
`µ` is any function that takes a vector and return a number (usually `mean`)
`E` is any function that takes a vector and returns a 2-element iterable, which represent the lower and upper CI for that bin.
Returns a 3-tuple with `(bin_c, µ_y, E_y)` such that you can plot the results with `plot(bin_c, µ_y, yerr=E_y)`
"""
binned(x,y, bins, μ, Ε) = begin
@assert length(x) == length(y)
h = fit(Histogram, x, bins)
ox = (bins[1:end-1] + bins[2:end])/2
xmap = StatsBase.binindex.(Ref(h), x)
oy = [sum(z.==xmap) > 0 ? μ(y[z.==xmap]) : NaN for z in 1:length(ox)]
oe = [sum(z.==xmap) > 0 ? Ε(y[z.==xmap]) : [NaN, NaN] for z in 1:length(ox)]
# This returns a long list of 2-tuples, but we want a 2-tuple of vectors
(ox, oy, (oy .- (x->x[1]).(oe), (x->x[2]).(oe) .- oy))
end
would something like this (but cleaned up/tested/etc) fit into StatsBase?