boost-histogram icon indicating copy to clipboard operation
boost-histogram copied to clipboard

Vectorize at/__getitem__

Open henryiii opened this issue 4 years ago • 2 comments

  • [ ] Vectorize _at
  • [ ] Vectorize _at_set
  • [ ] vectorize __getitem__, __setitem__ (uses the above functions internally)

henryiii avatar Oct 17 '19 09:10 henryiii

This is a bit tricky to implement; I've started it, but pybind11 doesn't provide runtime utilities for array access, and I don't want to generate 32 copies of this, so likely will miss the 1.0 target. I think that's fine, as no one has been too worried about missing this so far. The easy buffer access with .view() and such make it a bit less important.

henryiii avatar Feb 09 '21 14:02 henryiii

Hi @henryiii @HDembinski ,

I assume the following is related, if not please correct me and I open a fresh new issue... We noticed in the scope of our analysis that __getitem__ is a performance "hurdle" for high dimensional histograms (imagine: dataset axis of O(1000) dataset, category axis of O(100) categories and systematic axis of O(100) shifts).

I will put a snippet here, which makes the performance visible:

import boost_histogram as bh

h = bh.Histogram(
    bh.axis.StrCategory([str(i) for i in range(100)]),  # e.g. datasets
    bh.axis.StrCategory([str(i) for i in range(100)]),  # e.g. categories
    bh.axis.StrCategory([str(i) for i in range(100)]),  # e.g. systematics
    bh.axis.Regular(100, 0, 500),
)

# let's fill a dummy value
h[...] = 1.0

# now the __getitem__ performance:
%timeit h[bh.loc("42"), bh.loc("42"), bh.loc("42"), :].view()
4.08 s ± 61.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit h.view()[h.axes[0].index("42"), h.axes[1].index("42"), h.axes[2].index("42"), :]
20.3 µs ± 669 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Currently we use the second option, since on a larger analysis scale with multiple of these huge histograms this results in a difference of O(hours) and O(seconds) for histogram manipulation, such as grouping datasets to physics processes. However the first one is (obviously) a lot more convenient to use. I think this would be a major improvement, especially for the best usability of hist and boost_histogram for large-scale analysis.

Best, Peter

pfackeldey avatar Sep 14 '21 15:09 pfackeldey