uproot5 icon indicating copy to clipboard operation
uproot5 copied to clipboard

Cannot convert `TH1` to hist via `.to_boost()`

Open francesco-curcio opened this issue 3 years ago • 1 comments

I'm trying to read the histogram "hist" in this file file.root.zip and I can indeed use uproot.open, but when trying to convert it to a boost histogram or a hist with to_boost() or to_hist() I get

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [312], line 2
      1 f = uproot.open("file.root")
----> 2 f['hist'].to_hist().plot()

File ~/opt/anaconda3/envs/ml/lib/python3.10/site-packages/uproot/behaviors/TH1.py:206, in Histogram.to_hist(self, metadata, axis_metadata)
    195 def to_hist(self, metadata=boost_metadata, axis_metadata=boost_axis_metadata):
    196     """
    197     Args:
    198         metadata (dict of str \u2192 str): Metadata to collect (keys) and
   (...)
    203     Converts the histogram into a ``hist`` object.
    204     """
    205     return uproot.extras.hist().Hist(
--> 206         self.to_boost(metadata=boost_metadata, axis_metadata=boost_axis_metadata)
    207     )

File ~/opt/anaconda3/envs/ml/lib/python3.10/site-packages/uproot/behaviors/TH1.py:325, in TH1.to_boost(self, metadata, axis_metadata)
    323     view.variance = sumw2
    324 else:
--> 325     view[...] = values
    327 return out

File ~/opt/anaconda3/envs/ml/lib/python3.10/site-packages/boost_histogram/_internal/view.py:51, in View.__setitem__(self, ind, value)
     49     super().__setitem__(ind, array)  # type: ignore[no-untyped-call]
     50 else:
---> 51     raise ValueError("Needs matching ndarray or n+1 dim array")

ValueError: Needs matching ndarray or n+1 dim array

the to_numpy() method however works just fine.

Best, Francesco

francesco-curcio avatar Sep 20 '22 07:09 francesco-curcio

@jpivarski do you know why we slice the values for string categories here?

https://github.com/scikit-hep/uproot5/blame/092bc679b215d565652c23b04a782dbf6ba38e3a/src/uproot/behaviors/TH1.py#L317-L318

I assume this is the reason for this error; the subsequent branch for weight / non-weight handling erroneously chooses the non-weight path:

https://github.com/scikit-hep/uproot5/blame/092bc679b215d565652c23b04a782dbf6ba38e3a/src/uproot/behaviors/TH1.py#L321-L325

Regardless of what we're supposed to do with sumw2 in the first sample, we probably ought to use the type of storage to determine whether to set weights, because that will better reflect our intention.

agoose77 avatar Sep 20 '22 07:09 agoose77

I think this is related to the problems I am facing in https://github.com/scikit-hep/uproot5/pull/764. I am looking into it and will probably also solve this issue.

This code reproduces the problem with the weights. I will add a test based on the code below.

import hist

print(f'{hist.__version__=}')
import uproot

print(f'{uproot.__version__=}')
import numpy as np
import ROOT
import pytest

newfile = "tmp.root"
h = ROOT.TH1D("h", "h", 10, 0., 1.)
h.FillRandom("gaus", 10000)

assert len(h.GetSumw2()) == 0  # it is supposed to be zero

fout = ROOT.TFile(newfile, "RECREATE")
h.Write()
fout.Close()

# open same hist with uproot
with uproot.open(newfile) as fin:
    h1 = fin["h"]

assert len(h1.axes) == 1
assert h1.axis(0).edges().tolist() == pytest.approx([0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0])
assert len(h1.member("fSumw2")) == 0

# convert to hist
h2 = h1.to_hist()
assert str(h2.storage_type) == "<class 'boost_histogram.storage.Double'>"
assert len(h2.axes) == 1
# why is this failing? returns [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]
assert h2.axes[0].edges.tolist() == pytest.approx([0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0])

# write and read again
with uproot.recreate(newfile) as fout2:
    fout2["h"] = h2

with uproot.open(newfile) as fin2:
    h3 = fin2["h"]

# problems start here
assert len(h3.axes) == 1
assert h3.axis(0).edges().tolist() == pytest.approx([0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0])

# ERROR
assert len(h3.member("fSumw2")) == 0, "this should be 0, but it's not!"

lobis avatar Oct 31 '22 21:10 lobis