hist icon indicating copy to clipboard operation
hist copied to clipboard

[FEATURE] Non-uniform rebinning

Open yimuchen opened this issue 3 years ago • 10 comments

Right now, histograms can only be rebinned by some integer amount via the hist.rebin indicator. I would be nice if there was some way to rebin a certain axis arbitrary bin edges, as we might want to rebin a regular axis to be irregular just for low statistic region, without requiring the exact binning scheme to be known during histogram construction.

I'm not sure what the most optimal method should be, maybe something like extending the existing hist.rebin class to include something like hist.rebin( <new_axis>/<new bin edges> ) to specify the new binning scheme of interest?

yimuchen avatar Nov 15 '21 14:11 yimuchen

Below is my own implementation for the histogram manipulation that we want: right now it only handles NamedHist and is probably super slow for large histograms. But is a good staring point for describing what we want to do:

We would run it something with like: rebin_hist( h, x=new_x_axis, y=new_y_axis )

https://gist.github.com/yimuchen/a5e200c001ef4ea01681a7dd8fe89162

yimuchen avatar Nov 15 '21 20:11 yimuchen

A nice interface for this would be using array indexing, e.g. doing:

h = Hist(hist.axis.Regular(5, -5, 5))
rebinned_h = h[ [[0], [1,2,3], [4]] ]

Would return a new histogram with variable binning, and the central three bins merged into one.

swertz avatar Apr 13 '22 10:04 swertz

+1 on this

andrzejnovak avatar Apr 28 '22 13:04 andrzejnovak

If I may add, it would be also nice to be able to rebin an histogram based on a second one:

>>> h1.rebin(h2.axes.egdes)

gipert avatar Sep 08 '22 14:09 gipert

I just ran into a setup where I was looking for such a feature as well. Both the ability to specify new bin edges explicitly, and the possibility to pick specific bins to be merged (like @swertz's example) would be very useful!

alexander-held avatar Oct 04 '22 22:10 alexander-held

+1 on this. Having this functionality would make it a lot easier to produce quality plots for coffea based analyses.

garvitaa avatar Jan 09 '23 13:01 garvitaa

Probably others have too, but I've written a function for this for my own studies using hist: https://gist.github.com/kdlong/d697ee691c696724fc656186c25f8814

I think it is unique from the previous implementation in that it uses np.add.reduceat, so it shouldn't be so slow. I have fought with some details of it (like treating the overflow and underflow when rebinning to subset), and I think I've validated it, but I wouldn't swear in blood that there aren't mistakes. Others can try it out if it's useful, and I could convert it to a PR if it goes in the direction the developers would want.

kdlong avatar Jan 19 '23 21:01 kdlong

https://github.com/fabriceMUKARAGE/rebinning_histogram

Based on the feedback above, here is the Non-uniform rebinning I was working on. Maybe you can check it out if it makes sense. REAMe.md and the comments in the code explain it better I guess

fabriceMUKARAGE avatar Jul 07 '23 11:07 fabriceMUKARAGE

@kdlong Thanks! Not a developer but I think it would be a useful PR 🙂

rkansal47 avatar Jul 22 '23 15:07 rkansal47

From what I can tell looking at related posts/issues/features the real developers are close to having something centrally supported.

kdlong avatar Jul 24 '23 11:07 kdlong