boost-histogram icon indicating copy to clipboard operation
boost-histogram copied to clipboard

feat: support full UHI for rebinning

Open Saransh-cpp opened this issue 1 year ago • 3 comments

XRef #208

The current interface:

In [1]: import numpy as np
   ...: 
   ...: import boost_histogram as bh
   ...: 
   ...: h = bh.Histogram(bh.axis.Regular(10, 0, 1))
   ...: h.fill(np.random.normal(size=1_000_000))
   ...: rebin = bh.tag.Rebinner(factor=2)
   ...: h[::rebin]
Out[1]: Histogram(Regular(5, 0, 1), storage=Double()) # Sum: 341605.0 (1000000.0 with flow)

In [2]: rebin = bh.tag.Rebinner(groups=[1, 2, 3])

In [3]: h[::rebin]
Out[3]: Histogram(Variable([0, 0.1, 0.3, 0.6], metadata=...), storage=Double()) # Sum: 225749.0

In [4]: s = bh.tag.Slicer()
   ...: 
   ...: h = bh.Histogram(
   ...:     bh.axis.Regular(20, 1, 3), bh.axis.Regular(30, 1, 3),
   ...: bh.axis.Regular(40, 1, 3)
   ...: )
   ...: 
   ...: h[{0: s[:: bh.rebin(groups=[1, 2, 3])]}].axes.size
Out[4]: (3, 30, 40)

In [5]: h[{0: s[:: bh.rebin(groups=[1, 2, 3])], 2: s[:: bh.rebin(g
   ...: roups=[1, 2 ,3])]}].axes[2].edges
Out[5]: array([1.  , 1.05, 1.15, 1.3 ])
  • [ ] The code is a bit dirty and I don't know if it is perfectly optimized.
  • [ ] How should the code handle flow bins?
  • [ ] Is there any edge case that I am missing?

cc: @henryiii @matthewfeickert

Saransh-cpp avatar Jan 25 '24 14:01 Saransh-cpp

Ah, yeah, you probably have to use boost-histogram's cast system to go from C++ class to the correct Python class. I can look (hopefully by end of day or tomorrow, as I'll be teaching soon).

henryiii avatar Jan 25 '24 16:01 henryiii

Thanks for this very useful feature! I was wondering if this adds (or could add) support for renaming categorical axis values as well?

rkansal47 avatar Apr 12 '24 20:04 rkansal47

I still need to review this and make it work on callables.

henryiii avatar Apr 12 '24 21:04 henryiii