uproot5 icon indicating copy to clipboard operation
uproot5 copied to clipboard

Problem with interpreting XYZVector

Open soleti opened this issue 3 years ago • 6 comments

Hello!

I am having problems trying to read a Math::XYZVector object stored in this TTree TAtest2.root.txt.

>>> import uproot
>>> uproot.__version__
'4.0.0'
>>> file = uproot.open("TAtest2.root")
>>> trkana = file['TrkAnaNeg/trkana']
>>> trkana['demcent/_mom'].array()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/opt/miniconda3/lib/python3.8/site-packages/uproot/interpretation/numerical.py in basket_array(self, data, byte_offsets, basket, branch, context, cursor_offset, library)
    327         try:
--> 328             output = data.view(dtype).reshape((-1,) + shape)
    329         except ValueError:

ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-5-fc2b20effd04> in <module>
----> 1 trkana['demcent/_mom'].array()

/opt/miniconda3/lib/python3.8/site-packages/uproot/behaviors/TBranch.py in array(self, interpretation, entry_start, entry_stop, decompression_executor, interpretation_executor, array_cache, library)
   2057                         ranges_or_baskets.append((branch, basket_num, range_or_basket))
   2058 
-> 2059         _ranges_or_baskets_to_arrays(
   2060             self,
   2061             ranges_or_baskets,

/opt/miniconda3/lib/python3.8/site-packages/uproot/behaviors/TBranch.py in _ranges_or_baskets_to_arrays(hasbranches, ranges_or_baskets, branchid_interpretation, entry_start, entry_stop, decompression_executor, interpretation_executor, library, arrays)
   3428 
   3429         elif isinstance(obj, tuple) and len(obj) == 3:
-> 3430             uproot.source.futures.delayed_raise(*obj)
   3431 
   3432         else:

/opt/miniconda3/lib/python3.8/site-packages/uproot/source/futures.py in delayed_raise(exception_class, exception_value, traceback)
     44         exec("raise exception_class, exception_value, traceback")
     45     else:
---> 46         raise exception_value.with_traceback(traceback)
     47 
     48 

/opt/miniconda3/lib/python3.8/site-packages/uproot/behaviors/TBranch.py in basket_to_array(basket)
   3375             basket_arrays = branchid_arrays[branch.cache_key]
   3376 
-> 3377             basket_arrays[basket.basket_num] = interpretation.basket_array(
   3378                 basket.data,
   3379                 basket.byte_offsets,

/opt/miniconda3/lib/python3.8/site-packages/uproot/interpretation/numerical.py in basket_array(self, data, byte_offsets, basket, branch, context, cursor_offset, library)
    328             output = data.view(dtype).reshape((-1,) + shape)
    329         except ValueError:
--> 330             raise ValueError(
    331                 """basket {0} in tree/branch {1} has the wrong number of bytes ({2}) """
    332                 """for interpretation {3}

ValueError: basket 0 in tree/branch /TrkAnaNeg/trkana;1:demcent/_mom has the wrong number of bytes (3232) for interpretation AsStridedObjects(Model_ROOT_3a3a_Math_3a3a_DisplacementVector3D_3c_ROOT_3a3a_Math_3a3a_Cartesian3D_3c_float_3e2c_ROOT_3a3a_Math_3a3a_DefaultCoordinateSystemTag_3e__v1)
in file TAtest2.root.txt

Do you know if there is a way to interpret this kind of object correctly, since it is now the recommended alternative to TVector3 (https://root.cern.ch/doc/master/classTVector3.html)? Thanks!

soleti avatar Nov 22 '21 19:11 soleti

XYZVectors are not stored in a single branch, but in a group of branches. (Am I right about that? Check with trkana['demcent/_mom'].show().) There had been a few bugs in interpreting groups of branches, but they've been fixed since Uproot 4.0.0; the latest is 4.1.8.

You probably want to read each branch into components of a Vector. For instance,

import awkward as ak
import vector
vector.register_awkward()

xyz = trkana['demcent/_mom'].array()
array = ak.zip({"px": xyz["fX"], "py": xyz["fY"], "pz": xyz["fZ"]}, with_name="Momentum3D")

(or something similar; I'm writing this from memory.)

jpivarski avatar Nov 22 '21 20:11 jpivarski

Thank you for your reply Jim, but I still get the same error after upgrading. Also, it doesn't look like the XYZVector is being stored as a group of branches, see here:

import awkward as ak
import uproot
print(uproot.__version__)
import vector
vector.register_awkward()

file = uproot.open("TAtest2.root.txt")
trkana = file['TrkAnaNeg/trkana']
print(trkana['demcent/_mom'].show())
xyz = trkana['demcent/_mom'].array()
array = ak.zip({"px": xyz["fX"], "py": xyz["fY"], "pz": xyz["fZ"]}, with_name="Momentum3D")

This code returns:

4.1.8
name                 | typename                 | interpretation                
---------------------+--------------------------+-------------------------------
_mom                 | ROOT::Math::Displacement | AsStridedObjects(Model_ROOT_3a
None
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/opt/miniconda3/lib/python3.8/site-packages/uproot/interpretation/numerical.py in basket_array(self, data, byte_offsets, basket, branch, context, cursor_offset, library)
    341         try:
--> 342             output = data.view(dtype).reshape((-1,) + shape)
    343         except ValueError:

ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-1-68794709770c> in <module>
      8 trkana = file['TrkAnaNeg/trkana']
      9 print(trkana['demcent/_mom'].show())
---> 10 xyz = trkana['demcent/_mom'].array()
     11 array = ak.zip({"px": xyz["fX"], "py": xyz["fY"], "pz": xyz["fZ"]}, with_name="Momentum3D")

/opt/miniconda3/lib/python3.8/site-packages/uproot/behaviors/TBranch.py in array(self, interpretation, entry_start, entry_stop, decompression_executor, interpretation_executor, array_cache, library)
   2093                         ranges_or_baskets.append((branch, basket_num, range_or_basket))
   2094 
-> 2095         _ranges_or_baskets_to_arrays(
   2096             self,
   2097             ranges_or_baskets,

/opt/miniconda3/lib/python3.8/site-packages/uproot/behaviors/TBranch.py in _ranges_or_baskets_to_arrays(hasbranches, ranges_or_baskets, branchid_interpretation, entry_start, entry_stop, decompression_executor, interpretation_executor, library, arrays, update_ranges_or_baskets)
   3508 
   3509         elif isinstance(obj, tuple) and len(obj) == 3:
-> 3510             uproot.source.futures.delayed_raise(*obj)
   3511 
   3512         else:

/opt/miniconda3/lib/python3.8/site-packages/uproot/source/futures.py in delayed_raise(exception_class, exception_value, traceback)
     44         exec("raise exception_class, exception_value, traceback")
     45     else:
---> 46         raise exception_value.with_traceback(traceback)
     47 
     48 

/opt/miniconda3/lib/python3.8/site-packages/uproot/behaviors/TBranch.py in basket_to_array(basket)
   3452             basket_arrays = branchid_arrays[branch.cache_key]
   3453 
-> 3454             basket_arrays[basket.basket_num] = interpretation.basket_array(
   3455                 basket.data,
   3456                 basket.byte_offsets,

/opt/miniconda3/lib/python3.8/site-packages/uproot/interpretation/numerical.py in basket_array(self, data, byte_offsets, basket, branch, context, cursor_offset, library)
    342             output = data.view(dtype).reshape((-1,) + shape)
    343         except ValueError:
--> 344             raise ValueError(
    345                 """basket {0} in tree/branch {1} has the wrong number of bytes ({2}) """
    346                 """for interpretation {3}

ValueError: basket 0 in tree/branch /TrkAnaNeg/trkana;1:demcent/_mom has the wrong number of bytes (3232) for interpretation AsStridedObjects(Model_ROOT_3a3a_Math_3a3a_DisplacementVector3D_3c_ROOT_3a3a_Math_3a3a_Cartesian3D_3c_float_3e2c_ROOT_3a3a_Math_3a3a_DefaultCoordinateSystemTag_3e__v1)
in file TAtest2.root.txt```

soleti avatar Nov 22 '21 20:11 soleti

You're right: that's not a group (branch with subbranches). It's failing because "AsStridedObjects" means "try to interpret the buffer as

np.dtype([("fX", np.float64), ("fY", np.float64), ("fZ", np.float64)])

without doing any Python iteration," but the buffer does not have N × 3 × sizeof(component) bytes for any integer N. (I don't know what the component size is, whether it's really np.float64 or np.float32; I made that up by way of example.)

Could I see the file? I can try to find out how it's really encoded, and whether AsStridedObjects can't be used on this type.

jpivarski avatar Nov 22 '21 20:11 jpivarski

Thank you, I linked the file in the first comment, you can find it here: https://github.com/scikit-hep/uproot4/files/7583947/TAtest2.root.txt. Thanks for looking into this!

soleti avatar Nov 22 '21 20:11 soleti

Oh, thanks! I didn't see that.

What are some expected values in the first entry? I found the class name for this object:

>>> uproot.model.classname_decode(trkana['demcent/_mom'].interpretation.model.__name__)
('ROOT::Math::DisplacementVector3D<ROOT::Math::Cartesian3D<float>,ROOT::Math::DefaultCoordinateSystemTag>', 1)

(That's a long one!) And here's its streamer:

>>> file.file.streamer_named('ROOT::Math::DisplacementVector3D<ROOT::Math::Cartesian3D<float>,ROOT::Math::DefaultCoordinateSystemTag>').show()
ROOT::Math::DisplacementVector3D<ROOT::Math::Cartesian3D<float>,ROOT::Math::DefaultCoordinateSystemTag> (v1)
    fCoordinates: ROOT::Math::Cartesian3D<float> (TStreamerObjectAny)

Okay, so it only contains one thing, a ROOT::Math::Cartesian3D<float>. Here's the streamer for that:

>>> file.file.streamer_named('ROOT::Math::Cartesian3D<float>').show()
ROOT::Math::Cartesian3D<float> (v1)
    fX: float (TStreamerBasicType)
    fY: float (TStreamerBasicType)
    fZ: float (TStreamerBasicType)

Okay, so all of the values are float32 (not float64).

Let's dump the raw bytes of the first event, first in a debugging form and then as an array:

>>> trkana['demcent/_mom'].debug(0)
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
 64   0   0  28   0   0  84  86  37 100  64   0   0  18   0   0 234 251 160  10
  @ --- --- --- --- ---   T   V   %   d   @ --- --- --- --- --- --- --- --- ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
193  50 176 181  66 170 107  13  66 103 210   9
---   2 --- ---   B ---   k ---   B   g --- ---
>>> trkana['demcent/_mom'].debug_array(0)
array([ 64,   0,   0,  28,   0,   0,  84,  86,  37, 100,  64,   0,   0,
        18,   0,   0, 234, 251, 160,  10, 193,  50, 176, 181,  66, 170,
       107,  13,  66, 103, 210,   9], dtype=uint8)

Oh, wait a minute: the non-strided interpretation works:

>>> slow_interpretation = uproot.interpretation.identify.interpretation_of(trkana['demcent/_mom'], {}, False)
>>> trkana['demcent/_mom'].array(slow_interpretation)
<Array [{fCoordinates: {fX: -11.2, ... ] type='101 * struct[["fCoordinates"], [s...'>
>>> trkana['demcent/_mom'].array(slow_interpretation)[0].tolist()
{'fCoordinates': {'fX': -11.16814136505127, 'fY': 85.2090835571289, 'fZ': 57.95511245727539}}

It's called "slow_interpretation" because it gives up trying to interpret the buffer in one NumPy array cast and instead iterates over it in Python. (That's the False in interpretation_of.) If this is successfully interpreting it, what code is it using?

>>> print(file.file.class_named('ROOT::Math::Cartesian3D<float>').known_versions[1].class_code)
class Model_ROOT_3a3a_Math_3a3a_Cartesian3D_3c_float_3e__v1(uproot.model.VersionedModel):
    def read_members(self, chunk, cursor, context, file):
        if self.is_memberwise:
            raise NotImplementedError(
                "memberwise serialization of {0}\nin file {1}".format(type(self).__name__, self.file.file_path)
            )
        self._members['fX'], self._members['fY'], self._members['fZ'] = cursor.fields(chunk, self._format0, context)
    ...
    _format0 = struct.Struct('>fff')
    _format_memberwise0 = struct.Struct('>f')
    _format_memberwise1 = struct.Struct('>f')
    _format_memberwise2 = struct.Struct('>f')
    base_names_versions = []
    member_names = ['fX', 'fY', 'fZ']
    class_flags = {}

Well, that's just three consecutive floats (struct.Struct('>fff')); I don't see what the problem is with the strided interpretation.

The problem is that there's 20 bytes of stuff before the three floats. The slow_interpretation knows to skip that, but the strided interpretation does not. So for instance, a correct strided interpretation would be:

>>> strided_interpretation = uproot.AsDtype([("???", "S20"), ("fX", ">f4"), ("fY", ">f4"), ("fZ", ">f4")])
>>> trkana['demcent/_mom'].array(strided_interpretation)
<Array [...] type='101 * {"???": bytes, "fX": float32, "fY": float32, "fZ": floa...'>
>>> trkana['demcent/_mom'].array(strided_interpretation)[0].tolist()
{'???': b'@\x00\x00\x1c\x00\x00TV%d@\x00\x00\x12\x00\x00\xea\xfb\xa0\n', 'fX': -11.16814136505127, 'fY': 85.2090835571289, 'fZ': 57.95511245727539}

and we should just ignore the field named "???". I'm surprised that the slow_interpretation got those 20 bytes right: I don't see anything here that would be telling it that. This bug is in the conversion of a general (slow) interpretation into a strided one: it should not have allowed the conversion or it should have known to insert the 20 byte header. Until I can figure out how to make it automatically do either one of those things, I shouldn't be closing this issue.

Anyway, to solve your problem, you need (for now) to pass a custom interpretation, either

slow_interpretation = uproot.interpretation.identify.interpretation_of(trkana['demcent/_mom'], {}, False)

or

strided_interpretation = uproot.AsDtype([("???", "S20"), ("fX", ">f4"), ("fY", ">f4"), ("fZ", ">f4")])

jpivarski avatar Nov 22 '21 21:11 jpivarski

Thank you very much for such a thorough investigation. I will use the custom interpretation for now.

soleti avatar Nov 22 '21 22:11 soleti