cmssw icon indicating copy to clipboard operation
cmssw copied to clipboard

[CLANG_X][NONLTO_X][ASAN_X] RelVal 2500.329 failed: could not invert weight matrix

Open iarspider opened this issue 6 months ago • 12 comments

In CMSSW_15_1_NONLTO_X_2025-06-15-2300 and CMSSW_15_1_CLANG_X_2025-06-15-2300, RelVal 2500.329 failed with exception BasicSingleVertexState::could not invert weight matrix:

----- Begin Fatal Exception 16-Jun-2025 05:17:54 CEST-----------------------
An exception of category 'VertexException' occurred while
   [0] Processing  Event run: 1 lumi: 211534 event: 155899898 stream: 1
   [1] Running path 'nanoAOD_step'
   [2] Calling method for module V0ReBuilder/'KshortToPiPi'
Exception Message:
BasicSingleVertexState::could not invert weight matrix 
----- End Fatal Exception -------------------------------------------------

iarspider avatar Jun 16 '25 09:06 iarspider

cms-bot internal usage

cmsbuild avatar Jun 16 '25 09:06 cmsbuild

A new Issue was created by @iarspider.

@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

cmsbuild avatar Jun 16 '25 09:06 cmsbuild

assign from PhysicsTools/BPHNano

iarspider avatar Jun 16 '25 09:06 iarspider

New categories assigned: xpog

@ftorrresd,@hqucms you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild avatar Jun 16 '25 09:06 cmsbuild

First occurrence was in CMSSW_15_1_ASAN_X_2025-06-02-2300

iarspider avatar Jun 16 '25 10:06 iarspider

New failures:

(but these RVs didn't fail in CMSSW_15_1_ASAN_X_2025-06-16-2300)

iarspider avatar Jun 17 '25 07:06 iarspider

@gmelachr @drkovalskyi Could you please take a look at this? Thanks a lot!

hqucms avatar Jun 17 '25 08:06 hqucms

@gmelachr @drkovalskyi Could you please take a look at this? Thanks a lot!

I am having a look. I will let you know asap

gmelachr avatar Jun 17 '25 08:06 gmelachr

Adding @gkaratha and @vmariani in the loop,

The error is created in this line: https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/BPHNano/plugins/V0ReBuilder.cc#L132 If I have understood correctly and for some reason that I do not now, the kinematic fit fails to return the weight/covariance matrix.

The proposed solution is to replace this line with this try-catch snippet:

KinVtxFitter fitter;
try {
    fitter = KinVtxFitter(
    {v0daughter1_ttrack, v0daughter2_ttrack}, {Track1_mass, Track2_mass}, {Track1_sigma, Track2_sigma});
} catch (const VertexException& e) {
    edm::LogWarning("KinematicFit") << "Skipping candidate due to fit failure: " << e.what();
    continue;
}

This snippet has been tested locally in both CMSSW_15_1_NONLTO_X_2025-06-15-2300 and CMSSW_15_1_CLANG_X_2025-06-15-2300 and is working fine:

Image

CMSSW_15_1_NONLTO_X_2025-06-15-2300.txt

runthematrix_CMSSW_15_1_CLANG_X_2025-06-15-2300.txt

Furthermore, I would also suggest to replace all these lines[*] with similar try-catch snippets to avoid any other possible issues that have not been observed yet in the other decay modes and open a PR for both the master and the production releases.

@drkovalskyi will it cause any issue to the T0 configurations if we open and merge a PR for the production release now?

[*] https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/BPHNano/plugins/BToTrkLLBuilder.cc#L134 , https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/BPHNano/plugins/BToTrkLLBuilder.cc#L234 , https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/BPHNano/plugins/BToTrkTrkLLBuilder.cc#L165 , https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/BPHNano/plugins/BToTrkTrkLLBuilder.cc#L173 , https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/BPHNano/plugins/BToTrkTrkLLBuilder.cc#L181 , https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/BPHNano/plugins/BToTrkTrkLLBuilder.cc#L313 , https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/BPHNano/plugins/BToTrkTrkLLBuilder.cc#L322 , https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/BPHNano/plugins/BToTrkTrkLLBuilder.cc#L331 , https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/BPHNano/plugins/BToV0LLBuilder.cc#L133 , https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/BPHNano/plugins/BToV0LLBuilder.cc#L249 , https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/BPHNano/plugins/BToV0TrkDisplacedLLBuilder.cc#L174 , https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/BPHNano/plugins/BToV0TrkDisplacedLLBuilder.cc#L199 , https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/BPHNano/plugins/BToV0TrkDisplacedLLBuilder.cc#L322 , https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/BPHNano/plugins/BToV0TrkLLBuilder.cc#L174 , https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/BPHNano/plugins/BToV0TrkLLBuilder.cc#L328 , https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/BPHNano/plugins/DiLeptonBuilder.cc#L96 , https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/BPHNano/plugins/DiTrackBuilder.cc#L116 , https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/BPHNano/plugins/DiTrackBuilder.cc#L127 , https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/BPHNano/plugins/DiTrackBuilder.cc#L136 , https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/BPHNano/plugins/V0ReBuilder.cc#L131

gmelachr avatar Jun 18 '25 12:06 gmelachr

Thanks for the fix. do we know why fails? because locally i never saw it

gkaratha avatar Jun 18 '25 12:06 gkaratha

My code is full of such try blocks. I treat them as failed fits. As long as the failure rate is small, it shouldn't be an issue.

drkovalskyi avatar Jun 18 '25 14:06 drkovalskyi

@gkaratha I do not. I have run the same line dozens of times for 2022, 2023, 2024 and 2025, for data and MC inside a lot of CMSSW releases and it never caused this error with the gcc compiler. The issue appears when other compilers like CLANG is used, which we never tried before/locally. Someone who is expert on compilers should comment more/here. I am preparing the PR for CMSSW_15_0_X and CMSSW_15_1_X.

gmelachr avatar Jun 19 '25 06:06 gmelachr