Cannot convert `TH1` to hist via `.to_boost()`
I'm trying to read the histogram "hist" in this file file.root.zip and I can indeed use uproot.open, but when trying to convert it to a boost histogram or a hist with to_boost() or to_hist() I get
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In [312], line 2
1 f = uproot.open("file.root")
----> 2 f['hist'].to_hist().plot()
File ~/opt/anaconda3/envs/ml/lib/python3.10/site-packages/uproot/behaviors/TH1.py:206, in Histogram.to_hist(self, metadata, axis_metadata)
195 def to_hist(self, metadata=boost_metadata, axis_metadata=boost_axis_metadata):
196 """
197 Args:
198 metadata (dict of str \u2192 str): Metadata to collect (keys) and
(...)
203 Converts the histogram into a ``hist`` object.
204 """
205 return uproot.extras.hist().Hist(
--> 206 self.to_boost(metadata=boost_metadata, axis_metadata=boost_axis_metadata)
207 )
File ~/opt/anaconda3/envs/ml/lib/python3.10/site-packages/uproot/behaviors/TH1.py:325, in TH1.to_boost(self, metadata, axis_metadata)
323 view.variance = sumw2
324 else:
--> 325 view[...] = values
327 return out
File ~/opt/anaconda3/envs/ml/lib/python3.10/site-packages/boost_histogram/_internal/view.py:51, in View.__setitem__(self, ind, value)
49 super().__setitem__(ind, array) # type: ignore[no-untyped-call]
50 else:
---> 51 raise ValueError("Needs matching ndarray or n+1 dim array")
ValueError: Needs matching ndarray or n+1 dim array
the to_numpy() method however works just fine.
Best, Francesco
@jpivarski do you know why we slice the values for string categories here?
https://github.com/scikit-hep/uproot5/blame/092bc679b215d565652c23b04a782dbf6ba38e3a/src/uproot/behaviors/TH1.py#L317-L318
I assume this is the reason for this error; the subsequent branch for weight / non-weight handling erroneously chooses the non-weight path:
https://github.com/scikit-hep/uproot5/blame/092bc679b215d565652c23b04a782dbf6ba38e3a/src/uproot/behaviors/TH1.py#L321-L325
Regardless of what we're supposed to do with sumw2 in the first sample, we probably ought to use the type of storage to determine whether to set weights, because that will better reflect our intention.
I think this is related to the problems I am facing in https://github.com/scikit-hep/uproot5/pull/764. I am looking into it and will probably also solve this issue.
This code reproduces the problem with the weights. I will add a test based on the code below.
import hist
print(f'{hist.__version__=}')
import uproot
print(f'{uproot.__version__=}')
import numpy as np
import ROOT
import pytest
newfile = "tmp.root"
h = ROOT.TH1D("h", "h", 10, 0., 1.)
h.FillRandom("gaus", 10000)
assert len(h.GetSumw2()) == 0 # it is supposed to be zero
fout = ROOT.TFile(newfile, "RECREATE")
h.Write()
fout.Close()
# open same hist with uproot
with uproot.open(newfile) as fin:
h1 = fin["h"]
assert len(h1.axes) == 1
assert h1.axis(0).edges().tolist() == pytest.approx([0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0])
assert len(h1.member("fSumw2")) == 0
# convert to hist
h2 = h1.to_hist()
assert str(h2.storage_type) == "<class 'boost_histogram.storage.Double'>"
assert len(h2.axes) == 1
# why is this failing? returns [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]
assert h2.axes[0].edges.tolist() == pytest.approx([0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0])
# write and read again
with uproot.recreate(newfile) as fout2:
fout2["h"] = h2
with uproot.open(newfile) as fin2:
h3 = fin2["h"]
# problems start here
assert len(h3.axes) == 1
assert h3.axis(0).edges().tolist() == pytest.approx([0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0])
# ERROR
assert len(h3.member("fSumw2")) == 0, "this should be 0, but it's not!"