[RNTUPLE_X] RelVal 139.005: object of class vector<edm::StoredProductProvenance> read too few bytes
In CMSSW_15_1_RNTUPLE_X_2025-07-24-1100, RelVal 139.005 failed on step 2:
Begin processing the 29987th record. Run 346512, Event 5233680, LumiSection 6 on stream 0 at 24-Jun-2025 22:04:59.787 CEST
----- Begin Fatal Exception 24-Jun-2025 22:04:59 CEST-----------------------
An exception of category 'FileReadError' occurred while
[0] Reading branch EventProductProvenance
Additional Info:
[a] Fatal Root Error: @SUB=TBufferFile::CheckByteCount
object of class vector<edm::StoredProductProvenance> read too few bytes: 8 instead of 288
----- End Fatal Exception -------------------------------------------------
cms-bot internal usage
A new Issue was created by @iarspider.
@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
assign core
New categories assigned: core
@Dr15Jones,@makortel,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks
With -j1 step2 passes.
In CMSSW_15_1_RNTUPLE_X_2025, two relvals failed with this error: Relval 139.005 step 2, Relval 1049.0 step 2
With
-j1step2 passes.
And with -j2, surprisingly. But -j3, -j4 etc fails.
This is almost certainly a threading problem within root
assign root
type root
@pcanal Any thoughts? It looks like a threading problem (doesn't reproduce on 1 thread), and on 4 threads in all the past 5 RNTUPLE_X IBs we have logs for the job failed after processing 21k - 370k events (depending on the day).
The meta description does indeed point to a thread problem. The low level symptom (wrong bytecount) unfortunately does not help (it could be that the writing was wrong or that the ROOT meta-data was wrong during reading or that the TFile was accessed 'wrongly').
Is there a reproducer with a debug build of ROOT>
a debug build of ROOT
We need to build one now that the RNTUPLE_X IB is built with production setup
The meta description does indeed point to a thread problem. The low level symptom (wrong bytecount) unfortunately does not help (it could be that the writing was wrong or that the ROOT meta-data was wrong during reading or that the TFile was accessed 'wrongly').
Is there a reproducer with a debug build of ROOT>
@pcanal Here is a recipe on cmsdev42
/cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/9965/47122/install.sh
cd CMSSW_15_1_RNTUPLE_X_2025-07-06-2300/src
cmsenv
cmsRun /build/mkortela/debug/issue48400/CMSSW_15_1_RNTUPLE_X_2025-07-06-2300/src/reproducer_cfg.py
(although for me the ROOT debug build didn't reproduce the error, possibly because of being slower than the production build and thus hiding the problem)
Indeed, I can't reproduce it with the debug build :(
Can I have a recipe with the build/config that reproduces it?
Can I have a recipe with the build/config that reproduces it?
Still on cmsdev42
cmsrel CMSSW_15_1_RNTUPLE_X_2025-07-06-2300
cd CMSSW_15_1_RNTUPLE_X_2025-07-06-2300/src
cmsenv
cmsRun /build/mkortela/debug/issue48400/CMSSW_15_1_RNTUPLE_X_2025-07-06-2300/src/reproducer_cfg.py
We could also rebuild the debug build with RelWithDebInfo instead of Debug if that would help.
The bug is hiding from me :(. No luck so far seeing it in a debugger with the second recipe either.
Still happens: https://cmssdt.cern.ch/SDT/cgi-bin/logreader/el8_amd64_gcc12/CMSSW_15_1_RNTUPLE_X_2025-08-13-2300/pyRelValMatrixLogs/run/139.005_AlCaPhiSym2021/step2_AlCaPhiSym2021.log#/377844-377844
----- Begin Fatal Exception 14-Aug-2025 06:18:56 CEST-----------------------
An exception of category 'FileReadError' occurred while
[0] Reading branch EventProductProvenance
Additional Info:
[a] Fatal Root Error: @SUB=TBufferFile::CheckByteCount
object of class vector<edm::StoredProductProvenance> read too few bytes: 8 instead of 288
----- End Fatal Exception -------------------------------------------------
Still happens: http://cmssdt.cern.ch/SDT/cgi-bin/logreader/el8_amd64_gcc13/CMSSW_16_0_RNTUPLE_X_2025-10-08-2300/pyRelValMatrixLogs/run/139.005_AlCaPhiSym2021/step2_AlCaPhiSym2021.log#/102585-102585
----- Begin Fatal Exception 09-Oct-2025 04:05:31 CEST-----------------------
An exception of category 'FileReadError' occurred while
[0] Reading branch EventProductProvenance
Additional Info:
[a] Fatal Root Error: @SUB=TBufferFile::CheckByteCount
object of class vector<edm::StoredProductProvenance> read too few bytes: 8 instead of 288
----- End Fatal Exception -------------------------------------------------