StatsBase.jl icon indicating copy to clipboard operation
StatsBase.jl copied to clipboard

RFC: Histograms with errors

Open jpata opened this issue 10 years ago • 5 comments

I wrote a package [1] which derives from StatsBase and implements histograms with errors on the bin contents. Such histograms are heavily used for high-energy physics @ the Large Hadron Collider. Previously, this was discussed in https://github.com/JuliaStats/StatsBase.jl/issues/104.

In particular, we push values to the histogram with non-uniform weights and later model the bin counts using Poisson or Gaussian distributions.

Do you think this can be a part of StatsBase? If yes, I will prepare a PR soon. If not, I'll keep it as a separate package, but would like to put it to the julia metadata with a name that does not introduce confusion: looking for suggestions.

[1] https://github.com/jpata/Histograms.jl/blob/master/src/Histograms.jl

jpata avatar Oct 06 '15 09:10 jpata

I don't know this kind of histogram. Do you have an easy refererence for it? Preferably with a picture. My feeling is that StatsBase.jl might not be the right place for this. The package is meant for things that many other statistical packages would like to use and I'm not sure this functionality falls into that category.

andreasnoack avatar Oct 06 '15 13:10 andreasnoack

Here's a paper where the idea is described: http://arxiv.org/pdf/0712.4250.pdf

Here's the Higgs boson discovery plot where this is used to draw the error bars and also infer the statistical uncertainty (not shown) on the background. image The error bars are derived from storing the squared weights (weight != 1 in push!(h::ErrorHistogram, values, weight)) in a separate N-dimensional matrix. This allows us to also do error propagation on the bins.

I see your point about this package being kind of a base dependency for all things Stats. Do you think it's worth it to put under JuliaStats? In this case, I'll see if I can have Histograms.jl merged to METADATA with some kind of a reasonable name. Physicists are now starting to use these packages and better to have them with proper testing in METADATA than in a private repo.

jpata avatar Oct 06 '15 13:10 jpata

I would lean toward making this a package for now. Maybe WeightedHistograms.jl?

simonbyrne avatar Oct 06 '15 14:10 simonbyrne

@jpata Placing a repo under an organization is mainly beneficial if the repo has several contributors. As long as it is mainly a your package, it is just as fine to keep it under your personal profile.

andreasnoack avatar Oct 18 '15 20:10 andreasnoack

This issue can probably be closed?

nilshg avatar Mar 28 '23 14:03 nilshg