hist icon indicating copy to clipboard operation
hist copied to clipboard

Analysis shortcuts

Open henryiii opened this issue 4 years ago • 8 comments

This is to enable simple, quick analysis in a notebook or REPL. Here are current ideas (to be expanded):

Allow complex numbers for bh.loc

  • [x] This would allow bh.loc(1.5) + 2 to be written as 1.5j + 2. This is easier to read and type. We should also allow strings to be used directly.
h[3j, 4j + 1, "hi", True, 3]
# Calls
h[bh.loc(3), bh.loc(4, 1), bh.loc("hi"), True, 3]
complex(a, b) -> bh.loc(b, a)
str(a) -> bh.loc(str(a))
  • [ ] We could allow infty to be underflow/overflow

Allow sum for bh.sum -> Done, added to boost-histogram

We can check for the python sum, allowing h[::bh.sum] to be written as h[::sum].

Allow auto full-range slicing -> Add to boost-histogram

If someone does h[bh.rebin(2)], that possibly could implement h[::bh.rebin(2)] automatically. This is a bit tricky (since single callables already are supported like bh.loc), but might be possible.

Axes and storage

  • [x] See below

class Hist: Regular = HistAxesProxy(axes=Regular)

Hist.Regular -> HistAxesProxy(axes=Regular) # Only one with settable storage Hist.Int64 -> HistStorageProxy(storage=Regular)

HistAxesStorage()

from hist import Hist, axis

# Current method
h = Hist(axis.Regular(10,0,1), storage=bh.storage.Int64())
# Not helpful in IPython
h = Hist(axis.Regular(10,0,1), storage="Int64")

# Original idea
h = Hist.Double.Regular(10, 0, 1, 10, 0, 1)
h = Hist.Regular.Double(10, 0, 1, 10, 0, 1)
h = Hist.Regular.Double(10, 0, 1, "a", 10, 0, 1, "b")
# Additional idea
h = Hist.Regular.Double((10, 0, 1, "a"), (10, 0, 1, "b"))
h = Hist.Regular.Double((10, 0, 1, "a"))

# Jim's idea (I like this)
h = Hist.Regular(10, 0, 1) # Could be done, I think
h = Hist.Regular(10, 0, 1).Regular(10, 0, 1).Double()
h = Hist.Regular(10, 0, 1, "a").Regular(10, 0, 1, name="b").Double()

Quick axes setup

  • [x] We should allow flow=False to disable both flow bins.

henryiii avatar Mar 17 '20 18:03 henryiii

Note: these could be opt-in at first, since other ideas have been discussed. If they become popular and work well, they might even be able to be upstreamed into boost-histogram. :)

henryiii avatar Mar 17 '20 18:03 henryiii

Seems like h[::bh.sum] is not support, currently.

LovelyBuggies avatar Mar 18 '20 10:03 LovelyBuggies

This would allow bh.loc(1.5) + 2 to be written as 1.5j + 2. This is easier to read and type.

@henryiii , what does j mean? Could it possibly be any other characters like i? And could the index overflow after adding 2 (bh.loc(1.5) + 2 should be an index OR this is to add 2 to every location with 1.5)?

LovelyBuggies avatar Mar 21 '20 13:03 LovelyBuggies

A possible shortcut, here.

Should the Hist object own a name? Should the transform be available for Hist object (for every axis)? ...

pros:

  • Users can directly change (perform) many axes at one time.
  • ...

cons (challenges):

  • Ambiguous name
  • Fill method
  • ...

LovelyBuggies avatar Mar 24 '20 16:03 LovelyBuggies

what does j mean? Could it possibly be any other characters like i?

It's using the fact that Python has built-in support for complex numbers; 2j+1 becomes complex(2, 1). So we are stuck with j. Complex numbers can never be a valid 1D index (since they are 2D), so it's safe (IMO) to use them. The same trick is used by numpy in np.mgrid and several other places.

A user doesn't actually need to know j is a complex number, though. For our use case it's only a indicator that we want data coordinates.

And could the index overflow after adding 2 (bh.loc(1.5) + 2 should be an index OR this is to add 2 to every location with 1.5)?

This does what bh.loc(1.5) + 2 already does; it looks up the bin number that contains 1.5, then it adds two to that number. If that overflows, then it overflows.

henryiii avatar Mar 24 '20 18:03 henryiii

Should the Hist object own a name

Axis names will need to be unique within a hist. This would be a likely source of bugs, and due to the fill/access, could probably be very hard to use as a shortcut. The only way you could even use it would be to mix named access and positional access (regular Hist). Better to just require unique names for all axes in a Hist of any sort.

henryiii avatar Mar 24 '20 18:03 henryiii

Here's an outline for how this could work:

from functools import partial

# from boost_histogram import Histogram
class Histogram:
    def __init__(self, *axes):
        self._hist = axes
        self._ax = self._hist

    def do_something(self):
        print(self._hist)



class always_normal_method:
    def __get__(self, instance, owner=None):
        return partial(self.method, instance or owner())

    def __init__(self, method):
        self.method = method


class BaseHist(Histogram):
    def __init__(self):
        self._hist = None
        self._ax = []
        self._storage_proxy = None

    @always_normal_method
    def Regular(self):
        if self._hist:
            raise RuntimeError("Cannot add an axis to an existing histogram")
        self._ax.append("Regular")
        return self

    @always_normal_method
    def Variable(self):
        if self._hist:
            raise RuntimeError("Cannot add an axis to an existing histogram")
        self._ax.append("Variable")
        return self

    def __getattribute__(self, item):
        if not self._hist and not isinstance(getattr(self.__class__, item), always_normal_method):
            # Make histogram real here
            super().__init__(*self._ax, storage=self._storage_proxy)
            self._storage_proxy = None

        return object.__getattribute__(self, item)

Hist = BaseHist

# Run
Hist().do_something()

Hist().Regular().do_something()

Hist.Regular().do_something()

Hist.Regular().Variable().do_something()

h = Hist.Regular()
h.do_something()
# h.Regular()


# h = Hist.Regular(10,0,1).Regular(10,0,1)

henryiii avatar Jun 23 '20 13:06 henryiii

FYI, I added the string shortcut to boost-histogram in a PR, but Hans didn't want it there yet (which I understand), so I've pulled it out - but you might want to see the implementation for adding it to Hist. Here's the commit that removed it: https://github.com/scikit-hep/boost-histogram/pull/386/commits/dd1922b31e8a3680260a01f68547dd17592d664e

For us, we can simply apply bh.loc(x) when a string.

henryiii avatar Jul 02 '20 13:07 henryiii