uproot5 icon indicating copy to clipboard operation
uproot5 copied to clipboard

RNTuple fields have unexpected 4 bytes when recent master ROOT version is used

Open giedrius2020 opened this issue 1 year ago • 1 comments

Version and files:

  • uproot version: 5.3.12
  • ROOT versions to create RNTuple files: ROOT_632 (6.32) and ROOT_6_x (recent master version).
  • File to reproduce the issue needs to be created with ROOT RNTuple converter. Instructions to create such file are in this gist: https://gist.github.com/davidlange6/604f60f8a684b16538f4042bc96a8f18

Issue summary:

After creating RNTuple files with ROOT converter, we tried to read RNTuple files with uproot. When using ROOT_632 version, uproot could successfully read all arrays. However, when using ROOT_6_X converter, uproot could not read the file.

Issue analysis:

After trying to access keys() or RNTuple file, and error occured:

File ~/uproot5/src/uproot/source/chunk.py:452, in Chunk.get(self, start, stop, cursor, context)
    449             return self._raw_data[local_start:local_stop]
    451         else:
--> 452             raise uproot.deserialization.DeserializationError(
    453                 f"""attempting to get bytes {start}:{stop}
    454 outside expected range {self._start}:{self._stop} for this Chunk""",
    455                 self,
    456                 cursor.copy(),
    457                 context,
    458                 self._source.file_path,
    459             )

DeserializationError: attempting to get bytes 64473:1330838489
outside expected range 0:144472 for this Chunk
in file /home/cms-jovyan/my_root_files/rntuple_v7_6_0909.root

Following the trace-back, I analyzed memory differences near 64473 byte for ROOT_632 and ROOT_6_X files. Analysis results showed that files had different structure. Difference is shown in the picture below: 632 vs 6_X byte comparison

In this example, there is a moment when uproot expect to find the length of field name string ([17 0 0 0] for CorrT1METJet_area), but finds some unexpected value [69 2 0 0] and registers it as next string len instead. This break all cursor readings afterwards.

Conclusion:

Based on these results, I assume that recent ROOT_6_X version had an updated RNTuple file structure format with additional 4 bytes. Field has ROOT::VecOps::RVec type. Uproot is unable to read the file, because it expects different format. These additional 4 bytes need to be analyzed further and uproot needs to be updated to support older and recent version of RNTuple files.

giedrius2020 avatar Sep 16 '24 09:09 giedrius2020

Thank you, @giedrius2020! This is fixed by #1250. I'm still working on the PR, but maybe you could use that branch in the meantime.

ariostas avatar Sep 27 '24 16:09 ariostas