Incorrect writing of jagged arrays into a TTree
When writing a jagged array into a TTree in an output ROOT file, if the array is larger than the TBasket, segmentation violation occurs when attempting to read the data in a ROOT session. UpRoot stores incorrect number of baskets used which possibly leads to the issue in ROOT.
Simple example:
of=ur.recreate('test.root')
of['tree']={'wfm':ak.from_regular(ak.Array(np.arange(32000*5*6,dtype='float').reshape(32000,5,6)),axis=-2)}
of['tree'].num_baskets # this is equal to 1
of.close()
After opening in ROOT (note the size of the basket and number of baskets for branch wfm vs the size of the branch):
$ root test.root
root [0]
Attaching file test.root as _file0...
(TFile *) 0x7f9e8df4a730
root [1] tree->Print()
******************************************************************************
*Tree :tree : *
*Entries : 32000 : Total = 7937482 bytes File Size = 1410225 *
* : : Tree compression factor = 5.63 *
******************************************************************************
*Br 0 :nwfm : nwfm/I *
*Entries : 32000 : Total Size= 128552 bytes File Size = 729 *
*Baskets : 1 : Basket Size= 32000 bytes Compression= 175.68 *
*............................................................................*
*Br 1 :wfm : wfm[nwfm][6]/D *
*Entries : 32000 : Total Size= 7808656 bytes File Size = 1408155 *
*Baskets : 1 : Basket Size= 32000 bytes Compression= 5.54 *
*............................................................................*
root [2] tree->Draw("wfm")
Info in <TCanvas::MakeDefCanvas>: created default TCanvas with name c1
Warning in <TBasket::ReadBasketBuffers>: basket:wfm has fNevBuf=32000 but fEntryOffset=0, pos=22484, len=7808078, fNbytes=1408155, fObjlen=7808008, trying to repair
*** Break *** segmentation violation
...
I note here that if the stored array is not jagged, ROOT gives no segmentation violation even though the size of the branch is larger than a single basket and the number of baskets for the branch is still 1.
To add here, the data cannot be retrieved even with UpRoot.
>>> with ur.open('test.root:tree') as t :
... d = t.arrays()
...
Traceback (most recent call last):
File "/usr/local/lib/python3.13/site-packages/uproot/interpretation/numerical.py", line 359, in basket_array
output = data.view(dtype).reshape((-1, *shape))
ValueError: cannot reshape array of size 20000 into shape (6)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.13/site-packages/uproot/behaviors/TBranch.py", line 889, in arrays
_ranges_or_baskets_to_arrays(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
self,
^^^^^
...<9 lines>...
interp_options,
^^^^^^^^^^^^^^^
)
^
File "/usr/local/lib/python3.13/site-packages/uproot/behaviors/TBranch.py", line 3204, in _ranges_or_baskets_to_arrays
uproot.source.futures.delayed_raise(*obj)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
File "/usr/local/lib/python3.13/site-packages/uproot/source/futures.py", line 38, in delayed_raise
raise exception_value.with_traceback(traceback)
File "/usr/local/lib/python3.13/site-packages/uproot/behaviors/TBranch.py", line 3140, in basket_to_array
basket_array = interpretation.basket_array(
basket.data,
...<6 lines>...
interp_options,
)
File "/usr/local/lib/python3.13/site-packages/uproot/interpretation/jagged.py", line 180, in basket_array
content = self._content.basket_array(
data, None, basket, branch, context, cursor_offset, library, options
)
File "/usr/local/lib/python3.13/site-packages/uproot/interpretation/numerical.py", line 361, in basket_array
raise ValueError(
...<3 lines>...
) from err
ValueError: basket 0 in tree/branch /tree;1:wfm has the wrong number of bytes (160000) for interpretation AsDtype("('>f8', (6,))")
in file test.root
Thank you for reporting this, @vpec0! I'll look into it.
As far as I know, Uproot has no mechanism for splitting an array into multiple baskets, so it makes sense that it always writes a single basket for the branch.
It doesn't seem like the issue is related to the size of the array being larger than what the TBasket can hold, because even a small array fails to be written correctly.
>>> of=ur.recreate('test.root')
... of['tree']={'wfm': ak.from_regular(ak.Array(np.arange(32*5*6,dtype='float').reshape(-1,5,6)))}
... print(of['tree'].num_baskets) # this is equal to 1
... of.close()
... with ur.open('test.root:tree') as t:
... d = t.arrays()
So it looks like the issue happens when it is a jagged array of regular arrays. I'm surprised that this wasn't tested.
I'm not very familiar with the TTree writing parts, so it might take me some time to figure it out.
In the meantime, I'd invite you to switch to using RNTuples instead of a TTree. We intend to fully support RNTuples, as opposed to the limited support that we offer for TTrees. Here's how you can do it.
>>> of=ur.recreate('test.root')
... data = {'wfm': ak.from_regular(ak.Array(np.arange(32*5*6,dtype='float').reshape(-1,5,6)))}
... of.mkrntuple("rntuple", data)
... of.close()
... with ur.open('test.root:rntuple') as t:
... d = t.arrays()
As far as I know, Uproot has no mechanism for splitting an array into multiple baskets, so it makes sense that it always writes a single basket for the branch.
Btw...there have been cases in analysis code where having this mechanism in uproot would have been very beneficial instead of looping over chunks of the array and appending to the tree to write multiple tbaskets manually.
In the meantime, I'd invite you to switch to using RNTuples instead of a TTree. We intend to fully support RNTuples, as opposed to the limited support that we offer for TTrees.
I don't mind using RNTuples, however, in my analysis, I do have nested records -- example:
print(data.typestr)
23 * {run: uint32, event: uint32, hasT0: uint16, end_xyz: 3 * float32, t0: float32, flash_time: var * float32, avg_wfm: var * float64, peak: var * {idx: int64, prominence: float64, ipr: float64}, wfm: var * 1024 * float64}
mkrntuple cannot write that data for me:
...
File "/usr/local/lib/python3.13/site-packages/uproot/writing/_cascadentuple.py", line 496, in _build_field_col_records
raise NotImplementedError(f"Form type {type(akform)} cannot be written yet")
NotImplementedError: Form type <class 'awkward.forms.indexedform.IndexedForm'> cannot be written yet
Or is there a workaround?
@ikrommyd yeah, that's definitely a useful feature and I'm planning to add it for RNTuples.
@vpec0 Sorry about that, it currently supports nested records, but I'm still missing a few Awkward form types. The one you point out is pretty easy to implement, so I'll aim to add it by the next release.
@vpec0 I opened #1493 to add support for IndexedArrays. We'll have a release of Uproot next week, so it should work by then.
Hi @vpec0 Could you please check the new release and let us know if it is fixed in https://github.com/scikit-hep/uproot5/pull/1493? Thanks!
I have tested the workaround that uses RNTuple with the latest uproot version, 5.6.5. I still do get an error when attempting to write the data:
print(data)
outf.mkrntuple(tree_name,data)
gives the following error:
outf.mkrntuple(tree_name,data)
~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.13/site-packages/uproot/writing/writable.py", line 1376, in mkrntuple
ntuple.extend(ak_data)
~~~~~~~~~~~~~^^^^^^^^^
File "/usr/local/lib/python3.13/site-packages/uproot/writing/writable.py", line 2180, in extend
self._cascading.extend(self._file, self._file.sink, data)
~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.13/site-packages/uproot/writing/_cascadentuple.py", line 971, in extend
content = data_buffers[f"{next_barekey}-data"]
~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'node8-data'
Command exited with non-zero status 1
@ianna I am assuming nothing has been done on the TTree side, so no point to test that, right?
@vpec0 nothing has been done on the TTree side yet.
Would you be able to pickle the data you're trying to save and attach it here so I can debug it? I must have missed something while implementing IndexArrays.
Nevermind, I know what's wrong. I'll work on fixing it.
Good, because I could not reproduce the error after I had pickled the array.
@vpec0 could you try using #1496 to see if it finally works?
I think I have tried #1496 by pip-installing it in a venv:
python -m venv venv
. venv/bin/activate
pip install git+https://github.com/scikit-hep/uproot5.git@ariostas/fix_index_arrays
...
After running my script that tries saving the RNTuple. I still do get an error.
outf.mkrntuple(tree_name,ak.Array(data))
~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../venv/lib/python3.13/site-packages/uproot/writing/writable.py", line 1371, in mkrntuple
ntuple.extend(ak_form_or_data)
~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File ".../venv/lib/python3.13/site-packages/uproot/writing/writable.py", line 2180, in extend
self._cascading.extend(self._file, self._file.sink, data)
~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../venv/lib/python3.13/site-packages/uproot/writing/_cascadentuple.py", line 984, in extend
col_data = data_buffers[key]
~~~~~~~~~~~~^^^^^
KeyError: 'node14-offsets'
Command exited with non-zero status 1
Okay, thank! I'll keep looking into it.
Could you send me the output of ak.Array(data).layout.form so I can debug it? I'm not really sure what could be happening.
Here you go:
{
"class": "RecordArray",
"fields": [
"run",
"event",
"hasT0",
"end_xyz",
"t0",
"flash_time",
"avg_wfm",
"peak",
"wfm"
],
"contents": [
{
"class": "IndexedArray",
"index": "i64",
"content": "uint32"
},
{
"class": "IndexedArray",
"index": "i64",
"content": "uint32"
},
{
"class": "IndexedArray",
"index": "i64",
"content": "uint16"
},
{
"class": "IndexedArray",
"index": "i64",
"content": {
"class": "RegularArray",
"size": 3,
"content": {
"class": "IndexedArray",
"index": "i64",
"content": "float32"
}
}
},
{
"class": "IndexedArray",
"index": "i64",
"content": "float32"
},
{
"class": "IndexedArray",
"index": "i64",
"content": {
"class": "ListOffsetArray",
"offsets": "i64",
"content": {
"class": "IndexedArray",
"index": "i64",
"content": "float32"
}
}
},
{
"class": "IndexedArray",
"index": "i64",
"content": {
"class": "ListArray",
"starts": "i64",
"stops": "i64",
"content": "float64"
}
},
{
"class": "ListOffsetArray",
"offsets": "i64",
"content": {
"class": "RecordArray",
"fields": [
"idx",
"prominence",
"ipr"
],
"contents": [
"int64",
"float64",
"float64"
]
}
},
{
"class": "IndexedArray",
"index": "i64",
"content": {
"class": "ListOffsetArray",
"offsets": "i64",
"content": {
"class": "RegularArray",
"size": 1024,
"content": "float64"
}
}
}
]
}
@vpec0 Thanks so much for the help! Could you try again? I think now it should be good, but I'll add more tests to my PR to make sure I didn't miss anything.
Writing the RNTuple proceeded without any error.
However, attempts to read it back in fails:
with ur.open('testfile.root:rntuple_name') as t :
d = t.arrays()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[7], line 1
----> 1 t.arrays()
File .../venv/lib/python3.13/site-packages/uproot/behaviors/RNTuple.py:774, in HasFields.arrays(self, expressions, cut, filter_name, filter_typename, filter_field, aliases, language, entry_start, entry_stop, decompression_executor, array_cache, library, backend, interpreter, ak_add_doc, how, interpretation_executor, filter_branch)
772 entry_start -= cluster_offset
773 entry_stop -= cluster_offset
--> 774 arrays = uproot.extras.awkward().from_buffers(
775 form,
776 cluster_num_entries,
777 container_dict,
778 allow_noncanonical_form=True,
779 backend="cuda" if interpreter == "gpu" and backend == "cuda" else "cpu",
780 )[entry_start:entry_stop]
782 arrays = uproot.extras.awkward().to_backend(arrays, backend=backend)
783 # no longer needed; save memory
File .../venv/lib/python3.13/site-packages/awkward/_dispatch.py:41, in named_high_level_function.<locals>.dispatch(*args, **kwargs)
38 @wraps(func)
39 def dispatch(*args, **kwargs):
40 # NOTE: this decorator assumes that the operation is exposed under `ak.`
---> 41 with OperationErrorContext(name, args, kwargs):
42 gen_or_result = func(*args, **kwargs)
43 if isgenerator(gen_or_result):
File .../venv/lib/python3.13/site-packages/awkward/_errors.py:80, in ErrorContext.__exit__(self, exception_type, exception_value, traceback)
78 self._slate.__dict__.clear()
79 # Handle caught exception
---> 80 raise self.decorate_exception(exception_type, exception_value)
81 else:
82 # Step out of the way so that another ErrorContext can become primary.
83 if self.primary() is self:
File .../venv/lib/python3.13/site-packages/awkward/_dispatch.py:42, in named_high_level_function.<locals>.dispatch(*args, **kwargs)
38 @wraps(func)
39 def dispatch(*args, **kwargs):
40 # NOTE: this decorator assumes that the operation is exposed under `ak.`
41 with OperationErrorContext(name, args, kwargs):
---> 42 gen_or_result = func(*args, **kwargs)
43 if isgenerator(gen_or_result):
44 array_likes = next(gen_or_result)
File .../venv/lib/python3.13/site-packages/awkward/operations/ak_from_buffers.py:110, in from_buffers(form, length, container, buffer_key, backend, byteorder, allow_noncanonical_form, highlevel, behavior, attrs)
29 @high_level_function()
30 def from_buffers(
31 form,
(...) 41 attrs=None,
42 ):
43 """
44 Args:
45 form (#ak.forms.Form or str/dict equivalent): The form of the Awkward
(...) 108 See #ak.to_buffers for examples.
109 """
--> 110 return _impl(
111 form,
112 length,
113 container,
114 buffer_key,
115 backend,
116 byteorder,
117 highlevel,
118 behavior,
119 attrs,
120 allow_noncanonical_form,
121 )
File .../venv/lib/python3.13/site-packages/awkward/operations/ak_from_buffers.py:157, in _impl(form, length, container, buffer_key, backend, byteorder, highlevel, behavior, attrs, simplify)
151 raise TypeError(
152 "'form' argument must be a Form or its Python dict/JSON string representation"
153 )
155 getkey = regularize_buffer_key(buffer_key)
--> 157 out = _reconstitute(
158 form,
159 length,
160 container,
161 getkey,
162 backend,
163 byteorder,
164 simplify,
165 field_path=(),
166 shape_generator=lambda: (length,),
167 )
169 return wrap_layout(out, highlevel=highlevel, attrs=attrs, behavior=behavior)
File .../venv/lib/python3.13/site-packages/awkward/operations/ak_from_buffers.py:620, in _reconstitute(form, length, container, getkey, backend, byteorder, simplify, field_path, shape_generator)
611 return ak.contents.RegularArray(
612 content,
613 form.size,
614 length,
615 parameters=form._parameters,
616 )
618 elif isinstance(form, ak.forms.RecordForm):
619 contents = [
--> 620 _reconstitute(
621 content,
622 length,
623 container,
624 getkey,
625 backend,
626 byteorder,
627 simplify,
628 (*field_path, field),
629 shape_generator,
630 )
631 for content, field in zip(form.contents, form.fields)
632 ]
633 return ak.contents.RecordArray(
634 contents,
635 None if form.is_tuple else form.fields,
(...) 638 backend=backend,
639 )
641 elif isinstance(form, ak.forms.UnionForm):
File .../venv/lib/python3.13/site-packages/awkward/operations/ak_from_buffers.py:572, in _reconstitute(form, length, container, getkey, backend, byteorder, simplify, field_path, shape_generator)
569 else:
570 next_length = _adjust_length(offsets)
--> 572 content = _reconstitute(
573 form.content,
574 next_length,
575 container,
576 getkey,
577 backend,
578 byteorder,
579 simplify,
580 field_path,
581 _shape_generator,
582 )
583 return ak.contents.ListOffsetArray(
584 ak.index.Index(offsets),
585 content,
586 parameters=form._parameters,
587 )
589 elif isinstance(form, ak.forms.RegularForm):
File .../venv/lib/python3.13/site-packages/awkward/operations/ak_from_buffers.py:282, in _reconstitute(form, length, container, getkey, backend, byteorder, simplify, field_path, shape_generator)
279 (length,) = shape_generator()
280 return (_adjust_length(length),)
--> 282 data = _from_buffer(
283 backend.nplike,
284 raw_array,
285 dtype=dtype,
286 count=real_length,
287 byteorder=byteorder,
288 field_path=field_path,
289 shape_generator=_shape_generator,
290 )
291 if form.inner_shape != ():
292 data = backend.nplike.reshape(data, (length, *form.inner_shape))
File .../venv/lib/python3.13/site-packages/awkward/operations/ak_from_buffers.py:242, in _from_buffer(nplike, buffer, dtype, count, byteorder, field_path, shape_generator)
240 if not (isinstance(nplike, Jax) and nplike.is_currently_tracing()):
241 if array.size < count:
--> 242 raise TypeError(
243 f"size of array ({array.size}) is less than size of form ({count})"
244 )
245 return array[:count]
246 else:
TypeError: size of array (115496) is less than size of form (192244)
This error occurred while calling
ak.from_buffers(
RecordForm-instance
int64-instance
{'column-0-data': array([28867, 28867, 28867, 28867, 28867, 28867, 28...
allow_noncanonical_form = True
backend = 'cpu'
)
@vpec0 I'm so sorry it still doesn't work. Would it be easy/possible for you to somehow send me your data and code so that I can debug it and we don't have to keep going back and forth? You could attach it here, put it on CERN drive, or email it to me at ariostas[at]princeton.edu, whichever is more convenient for you.
@ariostas I have sent you via email the path to the data file on CERN's eos and a link to my gitlab.cern.ch repository where the code is.
When I run my code against ca079b083, I do get errors (same as above https://github.com/scikit-hep/uproot5/issues/1490#issuecomment-3279632621), however, after upgrade to 9ef23fdb1, writing to the RNTuple and reading it back works.
Thank you for the patience and help, @vpec0! I'm glad that it finally works. At some point I'll take a look at the TTree side, but that's less of a priority for me right now.