hist
hist copied to clipboard
Analysis shortcuts
This is to enable simple, quick analysis in a notebook or REPL. Here are current ideas (to be expanded):
Allow complex numbers for bh.loc
- [x] This would allow
bh.loc(1.5) + 2
to be written as1.5j + 2
. This is easier to read and type. We should also allow strings to be used directly.
h[3j, 4j + 1, "hi", True, 3]
# Calls
h[bh.loc(3), bh.loc(4, 1), bh.loc("hi"), True, 3]
complex(a, b) -> bh.loc(b, a)
str(a) -> bh.loc(str(a))
- [ ] We could allow infty to be underflow/overflow
Allow sum
for bh.sum
-> Done, added to boost-histogram
We can check for the python sum
, allowing h[::bh.sum]
to be written as h[::sum]
.
Allow auto full-range slicing -> Add to boost-histogram
If someone does h[bh.rebin(2)]
, that possibly could implement h[::bh.rebin(2)]
automatically. This is a bit tricky (since single callables already are supported like bh.loc
), but might be possible.
Axes and storage
- [x] See below
class Hist: Regular = HistAxesProxy(axes=Regular)
Hist.Regular -> HistAxesProxy(axes=Regular) # Only one with settable storage Hist.Int64 -> HistStorageProxy(storage=Regular)
HistAxesStorage()
from hist import Hist, axis
# Current method
h = Hist(axis.Regular(10,0,1), storage=bh.storage.Int64())
# Not helpful in IPython
h = Hist(axis.Regular(10,0,1), storage="Int64")
# Original idea
h = Hist.Double.Regular(10, 0, 1, 10, 0, 1)
h = Hist.Regular.Double(10, 0, 1, 10, 0, 1)
h = Hist.Regular.Double(10, 0, 1, "a", 10, 0, 1, "b")
# Additional idea
h = Hist.Regular.Double((10, 0, 1, "a"), (10, 0, 1, "b"))
h = Hist.Regular.Double((10, 0, 1, "a"))
# Jim's idea (I like this)
h = Hist.Regular(10, 0, 1) # Could be done, I think
h = Hist.Regular(10, 0, 1).Regular(10, 0, 1).Double()
h = Hist.Regular(10, 0, 1, "a").Regular(10, 0, 1, name="b").Double()
Quick axes setup
- [x] We should allow
flow=False
to disable both flow bins.
Note: these could be opt-in at first, since other ideas have been discussed. If they become popular and work well, they might even be able to be upstreamed into boost-histogram. :)
Seems like h[::bh.sum]
is not support, currently.
This would allow bh.loc(1.5) + 2 to be written as 1.5j + 2. This is easier to read and type.
@henryiii , what does j
mean? Could it possibly be any other characters like i
? And could the index overflow after adding 2 (bh.loc(1.5) + 2
should be an index OR this is to add 2 to every location with 1.5)?
A possible shortcut, here.
Should the Hist object own a name? Should the transform be available for Hist object (for every axis)? ...
pros:
- Users can directly change (perform) many axes at one time.
- ...
cons (challenges):
- Ambiguous name
- Fill method
- ...
what does j mean? Could it possibly be any other characters like i?
It's using the fact that Python has built-in support for complex numbers; 2j+1
becomes complex(2, 1)
. So we are stuck with j. Complex numbers can never be a valid 1D index (since they are 2D), so it's safe (IMO) to use them. The same trick is used by numpy in np.mgrid and several other places.
A user doesn't actually need to know j
is a complex number, though. For our use case it's only a indicator that we want data coordinates.
And could the index overflow after adding 2 (bh.loc(1.5) + 2 should be an index OR this is to add 2 to every location with 1.5)?
This does what bh.loc(1.5) + 2
already does; it looks up the bin number that contains 1.5, then it adds two to that number. If that overflows, then it overflows.
Should the Hist object own a name
Axis names will need to be unique within a hist. This would be a likely source of bugs, and due to the fill/access, could probably be very hard to use as a shortcut. The only way you could even use it would be to mix named access and positional access (regular Hist). Better to just require unique names for all axes in a Hist of any sort.
Here's an outline for how this could work:
from functools import partial
# from boost_histogram import Histogram
class Histogram:
def __init__(self, *axes):
self._hist = axes
self._ax = self._hist
def do_something(self):
print(self._hist)
class always_normal_method:
def __get__(self, instance, owner=None):
return partial(self.method, instance or owner())
def __init__(self, method):
self.method = method
class BaseHist(Histogram):
def __init__(self):
self._hist = None
self._ax = []
self._storage_proxy = None
@always_normal_method
def Regular(self):
if self._hist:
raise RuntimeError("Cannot add an axis to an existing histogram")
self._ax.append("Regular")
return self
@always_normal_method
def Variable(self):
if self._hist:
raise RuntimeError("Cannot add an axis to an existing histogram")
self._ax.append("Variable")
return self
def __getattribute__(self, item):
if not self._hist and not isinstance(getattr(self.__class__, item), always_normal_method):
# Make histogram real here
super().__init__(*self._ax, storage=self._storage_proxy)
self._storage_proxy = None
return object.__getattribute__(self, item)
Hist = BaseHist
# Run
Hist().do_something()
Hist().Regular().do_something()
Hist.Regular().do_something()
Hist.Regular().Variable().do_something()
h = Hist.Regular()
h.do_something()
# h.Regular()
# h = Hist.Regular(10,0,1).Regular(10,0,1)
FYI, I added the string shortcut to boost-histogram in a PR, but Hans didn't want it there yet (which I understand), so I've pulled it out - but you might want to see the implementation for adding it to Hist. Here's the commit that removed it: https://github.com/scikit-hep/boost-histogram/pull/386/commits/dd1922b31e8a3680260a01f68547dd17592d664e
For us, we can simply apply bh.loc(x)
when a string.