hist
hist copied to clipboard
[FEATURE] Non-uniform rebinning
Right now, histograms can only be rebinned by some integer amount via the hist.rebin
indicator. I would be nice if there was some way to rebin a certain axis arbitrary bin edges, as we might want to rebin a regular axis to be irregular just for low statistic region, without requiring the exact binning scheme to be known during histogram construction.
I'm not sure what the most optimal method should be, maybe something like extending the existing hist.rebin
class to include something like hist.rebin( <new_axis>/<new bin edges> )
to specify the new binning scheme of interest?
Below is my own implementation for the histogram manipulation that we want: right now it only handles NamedHist and is probably super slow for large histograms. But is a good staring point for describing what we want to do:
We would run it something with like:
rebin_hist( h, x=new_x_axis, y=new_y_axis )
https://gist.github.com/yimuchen/a5e200c001ef4ea01681a7dd8fe89162
A nice interface for this would be using array indexing, e.g. doing:
h = Hist(hist.axis.Regular(5, -5, 5))
rebinned_h = h[ [[0], [1,2,3], [4]] ]
Would return a new histogram with variable binning, and the central three bins merged into one.
+1 on this
If I may add, it would be also nice to be able to rebin an histogram based on a second one:
>>> h1.rebin(h2.axes.egdes)
I just ran into a setup where I was looking for such a feature as well. Both the ability to specify new bin edges explicitly, and the possibility to pick specific bins to be merged (like @swertz's example) would be very useful!
+1 on this. Having this functionality would make it a lot easier to produce quality plots for coffea based analyses.
Probably others have too, but I've written a function for this for my own studies using hist: https://gist.github.com/kdlong/d697ee691c696724fc656186c25f8814
I think it is unique from the previous implementation in that it uses np.add.reduceat, so it shouldn't be so slow. I have fought with some details of it (like treating the overflow and underflow when rebinning to subset), and I think I've validated it, but I wouldn't swear in blood that there aren't mistakes. Others can try it out if it's useful, and I could convert it to a PR if it goes in the direction the developers would want.
https://github.com/fabriceMUKARAGE/rebinning_histogram
Based on the feedback above, here is the Non-uniform rebinning I was working on. Maybe you can check it out if it makes sense. REAMe.md and the comments in the code explain it better I guess
@kdlong Thanks! Not a developer but I think it would be a useful PR 🙂
From what I can tell looking at related posts/issues/features the real developers are close to having something centrally supported.