RNTuple fields have unexpected 4 bytes when recent master ROOT version is used
Version and files:
- uproot version: 5.3.12
- ROOT versions to create RNTuple files: ROOT_632 (6.32) and ROOT_6_x (recent master version).
- File to reproduce the issue needs to be created with ROOT RNTuple converter. Instructions to create such file are in this gist: https://gist.github.com/davidlange6/604f60f8a684b16538f4042bc96a8f18
Issue summary:
After creating RNTuple files with ROOT converter, we tried to read RNTuple files with uproot. When using ROOT_632 version, uproot could successfully read all arrays. However, when using ROOT_6_X converter, uproot could not read the file.
Issue analysis:
After trying to access keys() or RNTuple file, and error occured:
File ~/uproot5/src/uproot/source/chunk.py:452, in Chunk.get(self, start, stop, cursor, context)
449 return self._raw_data[local_start:local_stop]
451 else:
--> 452 raise uproot.deserialization.DeserializationError(
453 f"""attempting to get bytes {start}:{stop}
454 outside expected range {self._start}:{self._stop} for this Chunk""",
455 self,
456 cursor.copy(),
457 context,
458 self._source.file_path,
459 )
DeserializationError: attempting to get bytes 64473:1330838489
outside expected range 0:144472 for this Chunk
in file /home/cms-jovyan/my_root_files/rntuple_v7_6_0909.root
Following the trace-back, I analyzed memory differences near 64473 byte for ROOT_632 and ROOT_6_X files. Analysis results showed that files had different structure. Difference is shown in the picture below:
In this example, there is a moment when uproot expect to find the length of field name string ([17 0 0 0] for CorrT1METJet_area), but finds some unexpected value [69 2 0 0] and registers it as next string len instead. This break all cursor readings afterwards.
Conclusion:
Based on these results, I assume that recent ROOT_6_X version had an updated RNTuple file structure format with additional 4 bytes. Field has ROOT::VecOps::RVec
Thank you, @giedrius2020! This is fixed by #1250. I'm still working on the PR, but maybe you could use that branch in the meantime.