uproot5 icon indicating copy to clipboard operation
uproot5 copied to clipboard

Implement stl containers for RNTuple

Open Moelf opened this issue 2 years ago • 2 comments

See:

  • https://github.com/root-project/root/blob/master/tree/ntuple/v7/doc/specifications.md#stl-types-and-collections

And similar existing tests for TTree:

  • https://github.com/scikit-hep/uproot5/blob/main/tests/test_0031-test-stl-containers.py
  • https://github.com/scikit-hep/uproot5/blob/main/tests/test_0033-more-interpretations-2.py

Moelf avatar Aug 03 '22 23:08 Moelf

Variant (Union) logic

We make a field with type std::variant<std::int32_t,double>,and actual content: [1.0, 4, 3, 2, 1]

the field and column records look like the following:

# fields
([MetaData('Field, parent_field_id=0, struct_role=3, field_name='variant_int32_double', type_name='std::variant<std::int32_t,double>', type_alias='', field_desc=''),
  MetaData('Field, parent_field_id=0, struct_role=0, field_name='_0', type_name='std::int32_t', type_alias='', field_desc=''),
  MetaData('Field, parent_field_id=0, struct_role=0, field_name='_1', type_name='double', type_alias='', field_desc='')],
# columns
 [MetaData('ColumnRecordFrame', type=3, nbits=64, field_id=0, flags=0),
  MetaData('ColumnRecordFrame', type=11, nbits=32, field_id=1, flags=0),
  MetaData('ColumnRecordFrame', type=7, nbits=64, field_id=2, flags=0)])

Let's focus on the Switch column, which has column id 0, treating Switch column's content as uint64, we see [8589934592 4294967296 4294967297 4294967298 4294967299], since each number is split into:

Lower 44 bits like kIndex64, higher 20 bits are a dispatch tag to a column ID

we look at the raw bits of these five numbers numpy.unpackbits:

raw_bits = numpy.unpackbits(D["column-0"].view(numpy.uint8))

0000000000000000000000000000000000000010000000000000000000000000
0000000000000000000000000000000000000001000000000000000000000000
0000000100000000000000000000000000000001000000000000000000000000
0000001000000000000000000000000000000001000000000000000000000000
0000001100000000000000000000000000000001000000000000000000000000

since the last four elements in actual content are all int32, I would expect the "higher 20 bits" to be the same for the last four numbers, but that doesn't make sense, 20 bits from right-hand-side don't make to where "1" is.

ideas? @jpivarski


If we pretend the split is actually 32bits - 32bits, then we can get reasonable results:

print(numpy.bitwise_and(D["column-0"], numpy.uint64(0x00000000ffffffff)))
print(D["column-0"] >> 32)

# kIndex
[0 0 1 2 3]
# column ids
[2 1 1 1 1]

Moelf avatar Aug 12 '22 05:08 Moelf

oh, it's was very recently implemented: https://github.com/root-project/root/commit/f4b676688a64cb7e7de7368561cab54a6aaaf1de


can confirm, using 6.27, and we cann see the 20-44 bits split:

        print(numpy.bitwise_and(D["column-0"], numpy.uint64(0x00000000000fffff)))
        print(D["column-0"] >> 44)
[0 0 1 2 3]
[2 1 1 1 1]

Moelf avatar Aug 12 '22 11:08 Moelf

>>> import numpy as np
>>> from pprint import pprint
>>> import skhep_testdata
>>> import uproot as up
>>> import awkward as ak
>>> filename = skhep_testdata.data_path("test_ntuple_stl_containers.root")
>>> r = up.open(filename)["ntuple"]
>>> pprint(r.arrays().to_list())
[{'string': 'one',
  'tuple_int32_string': {'_0': 1, '_1': 'one'},
  'variant_int32_float': 1.0,
  'vector_int32': [1],
  'vector_string': ['one'],
  'vector_tuple_int32_string': [{'_0': 1, '_1': 'one'}],
  'vector_variant_int32_float': [1],
  'vector_vector_int32': [[1]],
  'vector_vector_string': [['one']]},
 {'string': 'two',
  'tuple_int32_string': {'_0': 2, '_1': 'two'},
  'variant_int32_float': 2.0,
  'vector_int32': [1, 2],
  'vector_string': ['one', 'two'],
  'vector_tuple_int32_string': [{'_0': 1, '_1': 'one'}, {'_0': 2, '_1': 'two'}],
  'vector_variant_int32_float': [1, 2.0],
  'vector_vector_int32': [[1], [2]],
  'vector_vector_string': [['one'], ['two']]}]

Moelf avatar Aug 12 '22 22:08 Moelf

@jpivarski eh, pre-commit CI is doing some stripping and then flake8 errors on the stripped stuff, what's going on?

Moelf avatar Aug 18 '22 13:08 Moelf

Today's meeting will be a good chance to ask @henryiii about that. I've had to hold back pre-commit in #657 because the update is broken; I wonder if this is a partial update? It's definitely an incompatibility between the part that removes unnecessary "# noqa" and the part that needs those noqas.

jpivarski avatar Aug 18 '22 13:08 jpivarski

I wonder if changing the order helps? idk how it determines "unnecessary" without flake8 running through it

Moelf avatar Aug 18 '22 13:08 Moelf