StatsBase.jl icon indicating copy to clipboard operation
StatsBase.jl copied to clipboard

bin and reduce

Open jerlich opened this issue 1 year ago • 0 comments

I very regularly want to bin some 2D data based on the first element and then apply a grouping function to the 2nd element.

I wrote this for 2D case, but I have similar functions for the 3D case (bin on x,y and apply function to z)

"""
`binned(x,y,bins, μ, Ε)`

Takes vectors `x,y` of equal length than bins `x` according to `bins`. Apply `µ,E` (often mean) to `y` grouped by bins of `x`.

`µ` is any function that takes a vector and return a number (usually `mean`)
`E` is any function that takes a vector and returns a 2-element iterable, which represent the lower and upper CI for that bin.

Returns a 3-tuple with `(bin_c,  µ_y, E_y)` such that you can plot the results with `plot(bin_c, µ_y, yerr=E_y)`
"""
binned(x,y, bins, μ, Ε) = begin
   @assert length(x) == length(y)
    h = fit(Histogram, x, bins)

    ox = (bins[1:end-1] + bins[2:end])/2
    xmap = StatsBase.binindex.(Ref(h), x)

    oy = [sum(z.==xmap) > 0 ? μ(y[z.==xmap]) : NaN for z in 1:length(ox)]
    oe = [sum(z.==xmap) > 0 ? Ε(y[z.==xmap]) : [NaN, NaN] for z in 1:length(ox)]
    # This returns a long list of 2-tuples, but we want a 2-tuple of vectors
    (ox, oy, 	(oy .- (x->x[1]).(oe), (x->x[2]).(oe) .- oy))
end

would something like this (but cleaned up/tested/etc) fit into StatsBase?

jerlich avatar Sep 19 '23 19:09 jerlich