boost-histogram Multi weight storage

This PR adds a multi weight storage type (called MultiWeight) to boost-histogram. It addresses #83 and is bases on a prototype provided by @HDembinski in https://github.com/boostorg/histogram/issues/211.

It allows one to create a histogram that can store multiple independent weights per bin (the number of weights per bin has to be specified when creating the histogram).

Example

Create and fill a 1 dim histogram that stores 3 weights per bin:

import boost_histogram as bh
import numpy as np
x = np.array([1, 2])
weights = np.array([
        [1, 2, 3],
        [4, 5, 6]
    ])
h = bh.Histogram(bh.axis.Regular(5, 0, 5), storage = bh.storage.MultiWeight(3))
h.fill(x, sample = weights)

h[1] = [1, 2, 3]
h[2] = [4, 5, 6]

Status of this PR

The MultiWeight histograms can be created in python and they can be filled. Pickle also works.

The buffer and view for the histograms has yet to be implemented.

May 26 '25 11:05 Superharz

Some numbers comparing a 8*8 2D MultiWeight histogram with 20k weights vs 20k normal (single weights) histograms: Filling both with 100k random values takes about 1min 26s for a loop over all 20k single weights histograms vs 1.71s to fill the MultiWeight histogram with the same data. All single weight histograms together take up about 51 MB of RAM on my machine vs the one MultiWeight histogram only taking up about 15.3 MB

May 26 '25 14:05 Superharz

h.values() does not work yet. However, the important part (for me) is to fill the histograms not to retrieve the data from them. At the end, the data can be converted into a numpy array by a small loop over the histogram:

For a 8*8 2D histogram h with 20k weights:

a = np.zeros((8,8, 20000))
for i in range(8):
    for j in range(8):
        a[i,j] = np.array(h[i,j])

May 26 '25 16:05 Superharz

Wow, that's quite an impressive patch. I really appreciate the benchmarks, which nicely confirm the expected benefits. I hope you have more time to implement the changes.

I suggest we work on the implementation here and backport it to boostorg/histogram later. In the end, both libraries should be in sync.

May 27 '25 08:05 HDembinski

Hi @HDembinski . Thank you for reviewing this. How would you like me to address your comments? Should I do one commit per comment or fix everything and do one commit then?

May 27 '25 08:05 Superharz

Within a PR you don't need to do clean commits, I won't look at the commit history. Feel free to fix multiple issues in one commit.

May 27 '25 08:05 HDembinski

By the way, once this feature is done, I suggest you present this at the next PyHEP https://indico.cern.ch/event/1515852/ or the one next year, depending on how quickly we get this done. It is a major feature, and you deserve recognition for implementing this. I left science, so I don't go to any of these workshops anymore.

May 27 '25 08:05 HDembinski

Hi @HDembinski, I addressed the first batch of your comments. For the remaining ones I need some input from your side on how to continue :)

May 27 '25 15:05 Superharz

h.fill(x, sample = weights)

I didn't notice before, but you use sample to pass the weights, that's breaking interface assumptions. Weights should be passed with the weight keyword. I see that I did that, too, in my demo. I guess it was the easiest hack to make it work, but that's breaking all kinds of interface contracts.

Jun 22 '25 12:06 HDembinski

h.fill(x, sample = weights)

I didn't notice before, but you use sample to pass the weights, that's breaking interface assumptions. Weights should be passed with the weight keyword.

I completely agree. This was taken over from your original prototype and it worked, therefore I kept it.

Jun 22 '25 12:06 Superharz

However, it is kinda nice that it basically works with sample, so maybe it is enough to change the name of the storage to multi_count, so avoid having name "weight" in the name which gives the wrong idea.

Jun 22 '25 12:06 HDembinski

@Superharz I considering moving development of this feature back to boostorg/histogram. I find it easier to develop C++ code there. I will merge your changes here to that branch.

Jun 22 '25 12:06 HDembinski

Upstream: https://github.com/boostorg/histogram/pull/411. I assume we can work on this, and move to upstream whenever that goes in and gets released. I've fixed up the clang-tidy recommendations, and started adding some of the missing things. I added .nelem, and added it to the repr. It also needs to support .view, .values, etc.

This is far enough from being ready that I'm not going to try to get it in for the next release, but it can probably be in the following one.

Jul 31 '25 18:07 henryiii