boost-histogram icon indicating copy to clipboard operation
boost-histogram copied to clipboard

Histogram with WeightedMean storage returns wrong sum_of_weights_squared

Open olbessid opened this issue 1 year ago • 4 comments

I want to create histograms and be able to access their sum of weights squared. When using WeightedMean storage sum_of_weights_squared just returns the number of entries, not the sum of weights squared. The same issue is true for sum_of_weights (it returns the counts instead again), but this is a smaller issue for me.

I could in principle retrieve the correct sum of weights squared if I used accumulators instead of histograms. However, for the purpose of my data analysis, this would slow down the code a lot and I would need to replicate the large nested structure of the histograms into accumulators. So I would much prefer to just use histograms, if this bug can be fixed.

To test:

import boost_histogram as bh
h = bh.Histogram(bh.axis.Regular(1, 0, 2), storage=bh.storage.WeightedMean())  # Double() is the default
h.fill([1]*3, sample=[2]*3)
h.view().sum_of_weights_squared

The last line returns

array([3.])

while the sum of weights squared is actually 12.

I am using python 3.8. Attaching a screenshot of my notebook. notebook_weightssquared

olbessid avatar Apr 22 '24 15:04 olbessid

That's odd. @henryiii ?

HDembinski avatar Jun 03 '24 20:06 HDembinski

Using https://pyodide.org/en/stable/console.html because it's handy:

Screenshot 2024-06-03 at 4 46 00 PM

(Edit: chopped off the answer by mistake)

henryiii avatar Jun 03 '24 20:06 henryiii

Adding the copy-pasteable code from Henry's answer, the weight argument was missing from the fill command:

import boost_histogram as bh
h = bh.Histogram(bh.axis.Regular(1, 0, 2), storage=bh.storage.WeightedMean())
h.fill([1]*3, weight=2, sample=[2]*3)  # note use of weight here
# Histogram(Regular(1, 0, 2), storage=WeightedMean()) # Sum: WeightedMean(sum_of_weights=6, sum_of_weights_squared=12, value=2, variance=0)
h.view().sum_of_weights_squared
# array([12.])

matthewfeickert avatar Sep 13 '24 06:09 matthewfeickert

@olbessid if this is clear can the issue get closed?

matthewfeickert avatar Sep 13 '24 22:09 matthewfeickert