uproot5
uproot5 copied to clipboard
Problem with interpreting XYZVector
Hello!
I am having problems trying to read a Math::XYZVector object stored in this TTree TAtest2.root.txt.
>>> import uproot
>>> uproot.__version__
'4.0.0'
>>> file = uproot.open("TAtest2.root")
>>> trkana = file['TrkAnaNeg/trkana']
>>> trkana['demcent/_mom'].array()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/opt/miniconda3/lib/python3.8/site-packages/uproot/interpretation/numerical.py in basket_array(self, data, byte_offsets, basket, branch, context, cursor_offset, library)
327 try:
--> 328 output = data.view(dtype).reshape((-1,) + shape)
329 except ValueError:
ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-5-fc2b20effd04> in <module>
----> 1 trkana['demcent/_mom'].array()
/opt/miniconda3/lib/python3.8/site-packages/uproot/behaviors/TBranch.py in array(self, interpretation, entry_start, entry_stop, decompression_executor, interpretation_executor, array_cache, library)
2057 ranges_or_baskets.append((branch, basket_num, range_or_basket))
2058
-> 2059 _ranges_or_baskets_to_arrays(
2060 self,
2061 ranges_or_baskets,
/opt/miniconda3/lib/python3.8/site-packages/uproot/behaviors/TBranch.py in _ranges_or_baskets_to_arrays(hasbranches, ranges_or_baskets, branchid_interpretation, entry_start, entry_stop, decompression_executor, interpretation_executor, library, arrays)
3428
3429 elif isinstance(obj, tuple) and len(obj) == 3:
-> 3430 uproot.source.futures.delayed_raise(*obj)
3431
3432 else:
/opt/miniconda3/lib/python3.8/site-packages/uproot/source/futures.py in delayed_raise(exception_class, exception_value, traceback)
44 exec("raise exception_class, exception_value, traceback")
45 else:
---> 46 raise exception_value.with_traceback(traceback)
47
48
/opt/miniconda3/lib/python3.8/site-packages/uproot/behaviors/TBranch.py in basket_to_array(basket)
3375 basket_arrays = branchid_arrays[branch.cache_key]
3376
-> 3377 basket_arrays[basket.basket_num] = interpretation.basket_array(
3378 basket.data,
3379 basket.byte_offsets,
/opt/miniconda3/lib/python3.8/site-packages/uproot/interpretation/numerical.py in basket_array(self, data, byte_offsets, basket, branch, context, cursor_offset, library)
328 output = data.view(dtype).reshape((-1,) + shape)
329 except ValueError:
--> 330 raise ValueError(
331 """basket {0} in tree/branch {1} has the wrong number of bytes ({2}) """
332 """for interpretation {3}
ValueError: basket 0 in tree/branch /TrkAnaNeg/trkana;1:demcent/_mom has the wrong number of bytes (3232) for interpretation AsStridedObjects(Model_ROOT_3a3a_Math_3a3a_DisplacementVector3D_3c_ROOT_3a3a_Math_3a3a_Cartesian3D_3c_float_3e2c_ROOT_3a3a_Math_3a3a_DefaultCoordinateSystemTag_3e__v1)
in file TAtest2.root.txt
Do you know if there is a way to interpret this kind of object correctly, since it is now the recommended alternative to TVector3 (https://root.cern.ch/doc/master/classTVector3.html)? Thanks!
XYZVectors are not stored in a single branch, but in a group of branches. (Am I right about that? Check with trkana['demcent/_mom'].show()
.) There had been a few bugs in interpreting groups of branches, but they've been fixed since Uproot 4.0.0; the latest is 4.1.8.
You probably want to read each branch into components of a Vector. For instance,
import awkward as ak
import vector
vector.register_awkward()
xyz = trkana['demcent/_mom'].array()
array = ak.zip({"px": xyz["fX"], "py": xyz["fY"], "pz": xyz["fZ"]}, with_name="Momentum3D")
(or something similar; I'm writing this from memory.)
Thank you for your reply Jim, but I still get the same error after upgrading. Also, it doesn't look like the XYZVector is being stored as a group of branches, see here:
import awkward as ak
import uproot
print(uproot.__version__)
import vector
vector.register_awkward()
file = uproot.open("TAtest2.root.txt")
trkana = file['TrkAnaNeg/trkana']
print(trkana['demcent/_mom'].show())
xyz = trkana['demcent/_mom'].array()
array = ak.zip({"px": xyz["fX"], "py": xyz["fY"], "pz": xyz["fZ"]}, with_name="Momentum3D")
This code returns:
4.1.8
name | typename | interpretation
---------------------+--------------------------+-------------------------------
_mom | ROOT::Math::Displacement | AsStridedObjects(Model_ROOT_3a
None
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/opt/miniconda3/lib/python3.8/site-packages/uproot/interpretation/numerical.py in basket_array(self, data, byte_offsets, basket, branch, context, cursor_offset, library)
341 try:
--> 342 output = data.view(dtype).reshape((-1,) + shape)
343 except ValueError:
ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-1-68794709770c> in <module>
8 trkana = file['TrkAnaNeg/trkana']
9 print(trkana['demcent/_mom'].show())
---> 10 xyz = trkana['demcent/_mom'].array()
11 array = ak.zip({"px": xyz["fX"], "py": xyz["fY"], "pz": xyz["fZ"]}, with_name="Momentum3D")
/opt/miniconda3/lib/python3.8/site-packages/uproot/behaviors/TBranch.py in array(self, interpretation, entry_start, entry_stop, decompression_executor, interpretation_executor, array_cache, library)
2093 ranges_or_baskets.append((branch, basket_num, range_or_basket))
2094
-> 2095 _ranges_or_baskets_to_arrays(
2096 self,
2097 ranges_or_baskets,
/opt/miniconda3/lib/python3.8/site-packages/uproot/behaviors/TBranch.py in _ranges_or_baskets_to_arrays(hasbranches, ranges_or_baskets, branchid_interpretation, entry_start, entry_stop, decompression_executor, interpretation_executor, library, arrays, update_ranges_or_baskets)
3508
3509 elif isinstance(obj, tuple) and len(obj) == 3:
-> 3510 uproot.source.futures.delayed_raise(*obj)
3511
3512 else:
/opt/miniconda3/lib/python3.8/site-packages/uproot/source/futures.py in delayed_raise(exception_class, exception_value, traceback)
44 exec("raise exception_class, exception_value, traceback")
45 else:
---> 46 raise exception_value.with_traceback(traceback)
47
48
/opt/miniconda3/lib/python3.8/site-packages/uproot/behaviors/TBranch.py in basket_to_array(basket)
3452 basket_arrays = branchid_arrays[branch.cache_key]
3453
-> 3454 basket_arrays[basket.basket_num] = interpretation.basket_array(
3455 basket.data,
3456 basket.byte_offsets,
/opt/miniconda3/lib/python3.8/site-packages/uproot/interpretation/numerical.py in basket_array(self, data, byte_offsets, basket, branch, context, cursor_offset, library)
342 output = data.view(dtype).reshape((-1,) + shape)
343 except ValueError:
--> 344 raise ValueError(
345 """basket {0} in tree/branch {1} has the wrong number of bytes ({2}) """
346 """for interpretation {3}
ValueError: basket 0 in tree/branch /TrkAnaNeg/trkana;1:demcent/_mom has the wrong number of bytes (3232) for interpretation AsStridedObjects(Model_ROOT_3a3a_Math_3a3a_DisplacementVector3D_3c_ROOT_3a3a_Math_3a3a_Cartesian3D_3c_float_3e2c_ROOT_3a3a_Math_3a3a_DefaultCoordinateSystemTag_3e__v1)
in file TAtest2.root.txt```
You're right: that's not a group (branch with subbranches). It's failing because "AsStridedObjects" means "try to interpret the buffer as
np.dtype([("fX", np.float64), ("fY", np.float64), ("fZ", np.float64)])
without doing any Python iteration," but the buffer does not have N × 3 × sizeof(component) bytes for any integer N. (I don't know what the component size is, whether it's really np.float64
or np.float32
; I made that up by way of example.)
Could I see the file? I can try to find out how it's really encoded, and whether AsStridedObjects can't be used on this type.
Thank you, I linked the file in the first comment, you can find it here: https://github.com/scikit-hep/uproot4/files/7583947/TAtest2.root.txt. Thanks for looking into this!
Oh, thanks! I didn't see that.
What are some expected values in the first entry? I found the class name for this object:
>>> uproot.model.classname_decode(trkana['demcent/_mom'].interpretation.model.__name__)
('ROOT::Math::DisplacementVector3D<ROOT::Math::Cartesian3D<float>,ROOT::Math::DefaultCoordinateSystemTag>', 1)
(That's a long one!) And here's its streamer:
>>> file.file.streamer_named('ROOT::Math::DisplacementVector3D<ROOT::Math::Cartesian3D<float>,ROOT::Math::DefaultCoordinateSystemTag>').show()
ROOT::Math::DisplacementVector3D<ROOT::Math::Cartesian3D<float>,ROOT::Math::DefaultCoordinateSystemTag> (v1)
fCoordinates: ROOT::Math::Cartesian3D<float> (TStreamerObjectAny)
Okay, so it only contains one thing, a ROOT::Math::Cartesian3D<float>
. Here's the streamer for that:
>>> file.file.streamer_named('ROOT::Math::Cartesian3D<float>').show()
ROOT::Math::Cartesian3D<float> (v1)
fX: float (TStreamerBasicType)
fY: float (TStreamerBasicType)
fZ: float (TStreamerBasicType)
Okay, so all of the values are float32
(not float64
).
Let's dump the raw bytes of the first event, first in a debugging form and then as an array:
>>> trkana['demcent/_mom'].debug(0)
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
64 0 0 28 0 0 84 86 37 100 64 0 0 18 0 0 234 251 160 10
@ --- --- --- --- --- T V % d @ --- --- --- --- --- --- --- --- ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
193 50 176 181 66 170 107 13 66 103 210 9
--- 2 --- --- B --- k --- B g --- ---
>>> trkana['demcent/_mom'].debug_array(0)
array([ 64, 0, 0, 28, 0, 0, 84, 86, 37, 100, 64, 0, 0,
18, 0, 0, 234, 251, 160, 10, 193, 50, 176, 181, 66, 170,
107, 13, 66, 103, 210, 9], dtype=uint8)
Oh, wait a minute: the non-strided interpretation works:
>>> slow_interpretation = uproot.interpretation.identify.interpretation_of(trkana['demcent/_mom'], {}, False)
>>> trkana['demcent/_mom'].array(slow_interpretation)
<Array [{fCoordinates: {fX: -11.2, ... ] type='101 * struct[["fCoordinates"], [s...'>
>>> trkana['demcent/_mom'].array(slow_interpretation)[0].tolist()
{'fCoordinates': {'fX': -11.16814136505127, 'fY': 85.2090835571289, 'fZ': 57.95511245727539}}
It's called "slow_interpretation
" because it gives up trying to interpret the buffer in one NumPy array cast and instead iterates over it in Python. (That's the False
in interpretation_of
.) If this is successfully interpreting it, what code is it using?
>>> print(file.file.class_named('ROOT::Math::Cartesian3D<float>').known_versions[1].class_code)
class Model_ROOT_3a3a_Math_3a3a_Cartesian3D_3c_float_3e__v1(uproot.model.VersionedModel):
def read_members(self, chunk, cursor, context, file):
if self.is_memberwise:
raise NotImplementedError(
"memberwise serialization of {0}\nin file {1}".format(type(self).__name__, self.file.file_path)
)
self._members['fX'], self._members['fY'], self._members['fZ'] = cursor.fields(chunk, self._format0, context)
...
_format0 = struct.Struct('>fff')
_format_memberwise0 = struct.Struct('>f')
_format_memberwise1 = struct.Struct('>f')
_format_memberwise2 = struct.Struct('>f')
base_names_versions = []
member_names = ['fX', 'fY', 'fZ']
class_flags = {}
Well, that's just three consecutive floats (struct.Struct('>fff')
); I don't see what the problem is with the strided interpretation.
The problem is that there's 20 bytes of stuff before the three floats. The slow_interpretation
knows to skip that, but the strided interpretation does not. So for instance, a correct strided interpretation would be:
>>> strided_interpretation = uproot.AsDtype([("???", "S20"), ("fX", ">f4"), ("fY", ">f4"), ("fZ", ">f4")])
>>> trkana['demcent/_mom'].array(strided_interpretation)
<Array [...] type='101 * {"???": bytes, "fX": float32, "fY": float32, "fZ": floa...'>
>>> trkana['demcent/_mom'].array(strided_interpretation)[0].tolist()
{'???': b'@\x00\x00\x1c\x00\x00TV%d@\x00\x00\x12\x00\x00\xea\xfb\xa0\n', 'fX': -11.16814136505127, 'fY': 85.2090835571289, 'fZ': 57.95511245727539}
and we should just ignore the field named "???
". I'm surprised that the slow_interpretation
got those 20 bytes right: I don't see anything here that would be telling it that. This bug is in the conversion of a general (slow) interpretation into a strided one: it should not have allowed the conversion or it should have known to insert the 20 byte header. Until I can figure out how to make it automatically do either one of those things, I shouldn't be closing this issue.
Anyway, to solve your problem, you need (for now) to pass a custom interpretation, either
slow_interpretation = uproot.interpretation.identify.interpretation_of(trkana['demcent/_mom'], {}, False)
or
strided_interpretation = uproot.AsDtype([("???", "S20"), ("fX", ">f4"), ("fY", ">f4"), ("fZ", ">f4")])
Thank you very much for such a thorough investigation. I will use the custom interpretation for now.