cmssw icon indicating copy to clipboard operation
cmssw copied to clipboard

NAN in run 3 PF candidate dz leading to NAN in run 3 DeepMET

Open steggema opened this issue 1 year ago • 29 comments

Checking the DeepMET performance in run 3, we found a significant fraction of events for which DeepMET evaluates to NAN, up to 1% in a four-top sample. We've just traced this down to events where we have a PF candidate where dz() returns NAN, which we don't catch https://github.com/cms-sw/cmssw/blob/master/RecoMET/METPUSubtraction/plugins/DeepMETProducer.cc#L85

The fix will be easy enough (just checking for NAN), but I was wondering if dz() returning NAN for a PF candidate is expected or if it could impact something else.

Tagging co-developers (@yongbinfeng @mseidel42) and whoever i could easily find on github from JME (@alkaloge) PF (@swagata87) Reco (@mandrenguyen)

steggema avatar May 15 '24 12:05 steggema

cms-bot internal usage

cmsbuild avatar May 15 '24 12:05 cmsbuild

A new Issue was created by @steggema.

@rappoccio, @makortel, @antoniovilela, @smuzaffar, @sextonkennedy, @Dr15Jones can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

cmsbuild avatar May 15 '24 12:05 cmsbuild

FYI @cms-sw/jetmet-pog-l2 @cms-sw/pf-l2 @stahlleiton @cms-sw/reconstruction-l2

swagata87 avatar May 15 '24 12:05 swagata87

assign reconstruction

swagata87 avatar May 15 '24 12:05 swagata87

New categories assigned: reconstruction

@jfernan2,@mandrenguyen you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild avatar May 15 '24 12:05 cmsbuild

type jetmet, pf

swagata87 avatar May 15 '24 12:05 swagata87

what are the ParticleTypes for which NaN is obtained? In which years of Run3 is this happening? and it will be useful to have a recipe where one can reproduce the issue and can debug.

swagata87 avatar May 15 '24 12:05 swagata87

Are the full kinematics nan or just dz? We had some instances where problematic tracks led to this issue last year, but it's extremely difficult to debug to a general class of problems. I would start by excluding them from the deep met.

kdlong avatar May 15 '24 12:05 kdlong

We had some instances where problematic tracks led to this issue last year,

right, thanks, it could be related or similar, here is the documentation of that issue which Kenneth had debugged https://github.com/cms-sw/cmssw/issues/39110

swagata87 avatar May 15 '24 12:05 swagata87

To answer the questions in one go:

  • It's happening for 2022 MC, MiniAOD v4 (only tested this)
  • I looked at the particles, and it's charged hadrons that have all their kinematic values set to 0 (though interestingly the fromPV flag varies), while dz is NAN.
  • I also looked at the bestTrack (where it exists) which has finite dxy() and pt() but already NAN dz() (which puzzles me)
  • It does indeed look related to the other issue in the sense that all kinematic values are zero. Would the fix also address this? Are there samples available with the fix applied? (e.g. would we expect to not see this issue in 2023 or 2024 MC)

To reproduce: A few events with the issue can be obtained with edmCopyPickMerge outputFile=pickevents.root eventsToProcess=1:4256897,1:4257026,1:4257054,1:4418151,1:4418165,1:4418588 inputFiles=/store/mc/Run3Summer22MiniAODv4/DYJetsToLL_M-50_TuneCP5_13p6TeV-madgraphMLM-pythia8/MINIAODSIM/ALCARECO_forPOG_130X_mcRun3_2022_realistic_v5_ext1-v2/2530000/3f52a439-eb16-46f8-b9ab-0b1dd30da09a.root

steggema avatar May 15 '24 13:05 steggema

I looked a bit more at the "bestTrack" (when it exists), and they look really bizarre to me (just looking at the handful of events i mentioned above, i can look at more if people are curious):

  • chi2 is 0, ndof in the 5 to 10 range, and they are high purity (so they go to PF)
  • pT is around 1 GeV, |eta| around 2.7
  • pT error is inf
  • vx/vy and dxy are finite, vz and dz are inf

steggema avatar May 15 '24 14:05 steggema

It does indeed look related to the other issue in the sense that all kinematic values are zero. Would the fix also address this? Are there samples available with the fix applied? (e.g. would we expect to not see this issue in 2023 or 2024 MC)

can you check in 2023 or 2024 MC if the issue is still present or already addressed by Kenneth's fix?

swagata87 avatar May 16 '24 06:05 swagata87

It does indeed look related to the other issue in the sense that all kinematic values are zero. Would the fix also address this? Are there samples available with the fix applied? (e.g. would we expect to not see this issue in 2023 or 2024 MC)

can you check in 2023 or 2024 MC if the issue is still present or already addressed by Kenneth's fix?

The issue still appears to be there in 2023 and 2024 MC. The two samples I checked are /DYto2TautoMuTauh_M-50_TuneCP5_13p6TeV_madgraphMLM-pythia8/Run3Winter24NanoAOD-133X_mcRun3_2024_realistic_v9-v2/NANOAODSIM and /TTto2L2Nu-2Jets_TuneCP5_13p6TeV_amcatnloFXFX-pythia8/Run3Summer23NanoAODv12-130X_mcRun3_2023_realistic_v14-v2/NANOAODSIM - they seemed to be the most recent DY/TT samples I could find but let me know if there are any more recent samples I should check.

steggema avatar May 16 '24 07:05 steggema

I looked a bit more at the "bestTrack" (when it exists), and they look really bizarre to me (just looking at the handful of events i mentioned above, i can look at more if people are curious):

* chi2 is 0, ndof in the 5 to 10 range, and they are high purity (so they go to PF)

* pT is around 1 GeV, |eta| around 2.7

* pT error is inf

* vx/vy and dxy are finite, vz and dz are inf

Do I understand correctly that these tracks are present in the general-track collections already and being propagated to PF? Let me also tag @cms-sw/tracking-pog-l2 as it can be related to track reconstruction.

swagata87 avatar May 17 '24 08:05 swagata87

I looked a bit more at the "bestTrack" (when it exists), and they look really bizarre to me (just looking at the handful of events i mentioned above, i can look at more if people are curious):

* chi2 is 0, ndof in the 5 to 10 range, and they are high purity (so they go to PF)

* pT is around 1 GeV, |eta| around 2.7

* pT error is inf

* vx/vy and dxy are finite, vz and dz are inf

Do I understand correctly that these tracks are present in the general-track collections already and being propagated to PF? Let me also tag @cms-sw/tracking-pog-l2 as it can be related to track reconstruction.

To be very clear, I should say that I only looked at MiniAOD. There the "bestTrack" method returns tracks with the features that I mentioned. I would assume that this implies that they should be in the general-tracks collections and be high-purity since they make it into PF, but it's of course conceivable that something else happens. I think the AOD sample that corresponds to the MINIAOD one above is available so it should be possible to get the events and look into the collections there. I'll see if I can easily do it and will report back in the thread.

steggema avatar May 17 '24 09:05 steggema

Turns out that AODSIM isn't available... I actually can't easily find a run 3 sample that has both MiniAODSIM and AODSIM available - if someone can point me to such a sample, I'm happy to check.

steggema avatar May 17 '24 09:05 steggema

Turns out that AODSIM isn't available... I actually can't easily find a run 3 sample that has both MiniAODSIM and AODSIM available - if someone can point me to such a sample, I'm happy to check.

@cms-sw/pdmv-l2 can you help finding a AODSIM for run3 for checking the above mentioned issue for which we need to access general track collection?

And @steggema just out of curiosity, we expect this issue to also show up in Run3 data, right? If any example of data event is found, kindly paste the event pointer here.

swagata87 avatar May 17 '24 09:05 swagata87

Thanks, the DY->tau tau-> mu tauh sample indeed has AODSIM available on disk (the other doesn't), so I was able to run a quick check with it.

What I did is:

  • Find MiniAOD events where DeepMET fails
  • In the MiniAOD events, find the PF candidates with dz = nan.
  • Amongst those events, check whether there is a "bestTrack", and if yes, find a corresponding track in AODSIM
  • Print properties of these tracks

My main finding is that these tracks have finite dz, vz, and ptError values (whereas they are nan, nan, and inf for the bestTracks). I print some more information below (*) if it helps. I would conclude that it means that something is going wrong in how these tracks are being used within the PF algorithm. Please let me know if there's anything else that I can do to help.

(*) ######## Event 6 pt 1.00 eta -2.84 phi 0.53 dz 0.49 dxy -0.04 found 5 quality 2 True chi2 4.69 ndof 5.0 pt error 0.82 vz 0.22 algo duplicateMerge ######## Event 24 pt 1.09 eta -2.89 phi -2.33 dz 5.62 dxy 0.09 found 5 quality 2 True chi2 4.93 ndof 5.0 pt error 0.99 vz 5.79 algo highPtTripletStep ######## Event 72 pt 0.98 eta -2.66 phi 0.61 dz -1.64 dxy -0.04 found 5 quality 2 True chi2 2.88 ndof 5.0 pt error 0.94 vz -1.84 algo highPtTripletStep ######## Event 101 pt 1.02 eta -2.64 phi 0.90 dz -0.30 dxy 0.02 found 5 quality 2 True chi2 5.39 ndof 5.0 pt error 0.99 vz -0.40 algo highPtTripletStep ######## Event 103 pt 1.01 eta 2.64 phi -2.21 dz -1.46 dxy 0.07 found 5 quality 2 True chi2 7.45 ndof 5.0 pt error 0.85 vz -1.55 algo highPtTripletStep ######## Event 125 pt 1.16 eta 2.84 phi -0.23 dz -1.96 dxy -0.01 found 5 quality 2 True chi2 3.28 ndof 5.0 pt error 1.00 vz -1.55 algo highPtTripletStep ######## Event 136 pt 0.98 eta -2.73 phi -0.58 dz 0.75 dxy -0.01 found 5 quality 2 True chi2 3.43 ndof 5.0 pt error 0.98 vz 0.38 algo highPtTripletStep ######## Event 152 pt 0.97 eta -2.73 phi 0.07 dz 0.51 dxy -0.08 found 5 quality 2 True chi2 9.79 ndof 5.0 pt error 0.91 vz 0.17 algo highPtTripletStep ######## Event 164 pt 1.01 eta -2.71 phi -0.34 dz -1.15 dxy -0.01 found 5 quality 2 True chi2 3.26 ndof 5.0 pt error 0.99 vz -1.51 algo highPtTripletStep ######## Event 181 pt 0.99 eta 2.85 phi 2.04 dz -3.25 dxy 0.01 found 5 quality 2 True chi2 8.43 ndof 5.0 pt error 0.97 vz -3.56 algo duplicateMerge ######## Event 195 pt 0.97 eta 2.71 phi -2.37 dz -4.02 dxy 0.03 found 5 quality 2 True chi2 1.70 ndof 5.0 pt error 0.84 vz -4.18 algo highPtTripletStep

steggema avatar May 17 '24 12:05 steggema

Sorry for the multiple posts, but I realised I could of course also simply look at the PFCandidates in AODSIM. Interestingly, they have finite dzError and finite dz of the bestTrack, but they already have all kinematic values at 0. I'm dumping some more information just below in case it helps for the first 20 or so candidates.

######## Event 0 Found PF candidate with zero pt: pt=0.00 best track pt=0.62 eta=0.00 phi=0.00 dz=3.37 dzError=0.24 dxy=-0.01 dxyError=0.17 pdgId=-211 vz=2.98 algo=duplicateMerge ######## Event 1 Found PF candidate with zero pt: pt=0.00 best track pt=0.57 eta=0.00 phi=0.00 dz=0.70 dzError=0.28 dxy=0.09 dxyError=0.21 pdgId=211 vz=0.99 algo=highPtTripletStep ######## Event 2 Found PF candidate with zero pt: pt=0.00 best track pt=0.59 eta=0.00 phi=0.00 dz=2.28 dzError=0.20 dxy=0.03 dxyError=0.15 pdgId=-211 vz=1.86 algo=highPtTripletStep ######## Event 3 Found PF candidate with zero pt: pt=0.00 best track pt=0.55 eta=0.00 phi=0.00 dz=9.69 dzError=1.05 dxy=0.11 dxyError=0.23 pdgId=-211 vz=9.91 algo=highPtTripletStep ######## Event 4 Found PF candidate with zero pt: pt=0.00 best track pt=0.58 eta=0.00 phi=0.00 dz=-2.69 dzError=0.15 dxy=-0.00 dxyError=0.24 pdgId=-211 vz=-2.35 algo=highPtTripletStep ######## Event 5 Found PF candidate with zero pt: pt=0.00 best track pt=0.85 eta=0.00 phi=0.00 dz=-1.44 dzError=0.10 dxy=0.00 dxyError=0.11 pdgId=-211 vz=-1.47 algo=highPtTripletStep ######## Event 6 Found PF candidate with zero pt: pt=0.00 best track pt=1.00 eta=0.00 phi=0.00 dz=0.49 dzError=2.15 dxy=-0.04 dxyError=0.28 pdgId=211 vz=0.22 algo=duplicateMerge ######## Event 7 Found PF candidate with zero pt: pt=0.00 best track pt=0.91 eta=0.00 phi=0.00 dz=-0.06 dzError=0.17 dxy=-0.03 dxyError=0.10 pdgId=-211 vz=-0.39 algo=highPtTripletStep ######## Event 8 Found PF candidate with zero pt: pt=0.00 best track pt=0.63 eta=0.00 phi=0.00 dz=5.82 dzError=0.15 dxy=-0.01 dxyError=0.17 pdgId=-211 vz=5.47 algo=highPtTripletStep ######## Event 9 Found PF candidate with zero pt: pt=0.00 best track pt=0.74 eta=0.00 phi=0.00 dz=0.71 dzError=1.37 dxy=0.01 dxyError=0.22 pdgId=-211 vz=0.97 algo=highPtTripletStep ######## Event 10 Found PF candidate with zero pt: pt=0.00 best track pt=0.80 eta=0.00 phi=0.00 dz=-0.14 dzError=0.49 dxy=-0.01 dxyError=0.16 pdgId=-211 vz=-0.10 algo=highPtTripletStep ######## Event 11 Found PF candidate with zero pt: pt=0.00 best track pt=0.45 eta=0.00 phi=0.00 dz=7.64 dzError=1.37 dxy=-0.03 dxyError=0.34 pdgId=-211 vz=7.80 algo=duplicateMerge ######## Event 12 Found PF candidate with zero pt: pt=0.00 best track pt=0.70 eta=0.00 phi=0.00 dz=-2.26 dzError=1.74 dxy=0.02 dxyError=0.22 pdgId=-211 vz=-2.53 algo=duplicateMerge ######## Event 13 Found PF candidate with zero pt: pt=0.00 best track pt=0.89 eta=0.00 phi=0.00 dz=-1.81 dzError=1.07 dxy=0.02 dxyError=0.17 pdgId=211 vz=-2.00 algo=highPtTripletStep ######## Event 14 Found PF candidate with zero pt: pt=0.00 best track pt=0.44 eta=0.00 phi=0.00 dz=-1.45 dzError=0.37 dxy=0.08 dxyError=0.45 pdgId=-211 vz=-1.47 algo=highPtTripletStep ######## Event 15 Found PF candidate with zero pt: pt=0.00 best track pt=0.78 eta=0.00 phi=0.00 dz=1.16 dzError=0.34 dxy=-0.05 dxyError=0.11 pdgId=-211 vz=1.55 algo=highPtTripletStep ######## Event 16 Found PF candidate with zero pt: pt=0.00 best track pt=0.71 eta=0.00 phi=0.00 dz=-2.75 dzError=0.15 dxy=0.02 dxyError=0.17 pdgId=211 vz=-2.48 algo=highPtTripletStep ######## Event 17 Found PF candidate with zero pt: pt=0.00 best track pt=0.85 eta=0.00 phi=0.00 dz=-2.15 dzError=1.41 dxy=-0.03 dxyError=0.18 pdgId=-211 vz=-2.35 algo=duplicateMerge ######## Event 18 Found PF candidate with zero pt: pt=0.00 best track pt=0.55 eta=0.00 phi=0.00 dz=-1.17 dzError=0.92 dxy=0.10 dxyError=0.21 pdgId=-211 vz=-0.85 algo=highPtTripletStep ######## Event 19 Found PF candidate with zero pt: pt=0.00 best track pt=0.53 eta=0.00 phi=0.00 dz=3.07 dzError=0.89 dxy=0.01 dxyError=0.23 pdgId=211 vz=3.07 algo=highPtTripletStep ######## Event 20 Found PF candidate with zero pt: pt=0.00 best track pt=0.79 eta=0.00 phi=0.00 dz=-8.82 dzError=0.47 dxy=0.05 dxyError=0.13 pdgId=-211

steggema avatar May 17 '24 12:05 steggema

thanks, interesting.. looks like something went wrong in AOD->miniAOD step

swagata87 avatar May 17 '24 13:05 swagata87

Agreed. For what it's worth, it seems plausible to me that some of the packed candidate creation or methods rely on momenta not being zero, so maybe the right question to ask is what is the purpose of having a PF candidate with zero momentum in the first place...

steggema avatar May 17 '24 13:05 steggema

We had this discussion in the past. In principle zero momentum means that the kinematics aren't reliably estimated to be different than zero within uncertainties. But they can still play a role in candidate counting etc. I wouldn't say they are super useful, but it also wasn't obvious to me that we should kill them. Regardless we should avoid that zeros turn to nans, there's no world in which that is desirable.

kdlong avatar May 17 '24 13:05 kdlong

FWIW, at least these lines in packVtx() https://github.com/cms-sw/cmssw/blob/d376eb350a3e9af9f0ce3618fbac6afc36ce56d8/DataFormats/PatCandidates/src/PackedCandidate.cc#L42-L43 will result NaN with zero momentum track.

makortel avatar May 17 '24 13:05 makortel

If I recall correctly, I think the track properties are not stored in the packed PF candidate produced by the PAT packed candidate producer if the pt < 0.95 GeV

stahlleiton avatar May 17 '24 14:05 stahlleiton

This was lowered to 0.0 for Run3 MINIAOD (with low quality covariance matrix).

AdrianoDee avatar May 17 '24 15:05 AdrianoDee

This was lowered to 0.0 for Run3 MINIAOD (with low quality covariance matrix).

Ah yes, but only if it strictly larger than 0 GeV (so candidates with pT = 0 would then have no track properties)

stahlleiton avatar May 17 '24 15:05 stahlleiton

FWIW, at least these lines in packVtx()

https://github.com/cms-sw/cmssw/blob/d376eb350a3e9af9f0ce3618fbac6afc36ce56d8/DataFormats/PatCandidates/src/PackedCandidate.cc#L42-L43

will result NaN with zero momentum track.

thanks Matti

this[1] change seems to help;

Examples from /eos/cms//store/mc/Run3Winter24Reco/DYto2TautoMuTauh_M-50_TuneCP5_13p6TeV_madgraphMLM-pythia8/AODSIM/133X_mcRun3_2024_realistic_v9-v2/2830000/00502d01-cccd-453a-9f6d-160572a6b2ae.root after running AOD->miniAOD step for 200 evets; and looping over packedPFCandidates:

Before change [1]

dz, dxy, eta, pt  nan -0.06972656399011612 0.0 0.0
dz, dxy, eta, pt  nan -0.014941406436264515 0.0 0.0

after change [1]

dz, dxy, eta, pt  5.226568222045898 -0.06972656399011612 0.0 0.0
dz, dxy, eta, pt  5.175044059753418 -0.014941406436264515 0.0 0.0

[1]

--- a/DataFormats/PatCandidates/src/PackedCandidate.cc
+++ b/DataFormats/PatCandidates/src/PackedCandidate.cc
@@ -40,7 +40,9 @@ void pat::PackedCandidate::packVtx(bool unpackAfterwards) {
   // float dl = dxPV * c + dyPV * s;
   // float xRec = - dxy_ * s + dl * c, yRec = dxy_ * c + dl * s;
   float pzpt = p4_.load()->Pz() / p4_.load()->Pt();
-  dz_ = vertex_.load()->Z() - pv.Z() - (dxPV * c + dyPV * s) * pzpt;
+  if (!std::isnan(pzpt)) {
+    dz_ = vertex_.load()->Z() - pv.Z() - (dxPV * c + dyPV * s) * pzpt;
+  }

swagata87 avatar May 17 '24 16:05 swagata87

--- a/DataFormats/PatCandidates/src/PackedCandidate.cc
+++ b/DataFormats/PatCandidates/src/PackedCandidate.cc
@@ -40,7 +40,9 @@ void pat::PackedCandidate::packVtx(bool unpackAfterwards) {
   // float dl = dxPV * c + dyPV * s;
   // float xRec = - dxy_ * s + dl * c, yRec = dxy_ * c + dl * s;
   float pzpt = p4_.load()->Pz() / p4_.load()->Pt();
-  dz_ = vertex_.load()->Z() - pv.Z() - (dxPV * c + dyPV * s) * pzpt;
+  if (!std::isnan(pzpt)) {
+    dz_ = vertex_.load()->Z() - pv.Z() - (dxPV * c + dyPV * s) * pzpt;
+  }

I'd suggest to check e.g. p4_.load()->Pt() != 0.f (or also Pz() != 0.f depending if Inf is acceptable or not) directly instead of allowing NaN to be computed. On a quick look I also didn't see any other place that would initialize dz_, so I'd suggest to add explicit initialization somewhere.

makortel avatar May 17 '24 18:05 makortel

okay, how about this? with this change, the nan issue is solved, some before/after numbers are pasted below,

-  float pzpt = p4_.load()->Pz() / p4_.load()->Pt();
-  dz_ = vertex_.load()->Z() - pv.Z() - (dxPV * c + dyPV * s) * pzpt;
+  dz_ = 0;
+  if (p4_.load()->Pt() != 0.f) {
+    float pzpt = p4_.load()->Pz() / p4_.load()->Pt();
+    dz_ = vertex_.load()->Z() - pv.Z() - (dxPV * c + dyPV * s) * pzpt;
+  }

before the change

dz, dxy, eta, pt  nan -0.06972656399011612 0.0 0.0
dz, dxy, eta, pt  nan -0.014941406436264515 0.0 0.0
dz, dxy, eta, pt  nan 0.08945312350988388 0.0 0.0
dz, dxy, eta, pt  nan -0.0037353516090661287 0.0 0.0
dz, dxy, eta, pt  nan -0.022695312276482582 0.0 0.0
dz, dxy, eta, pt  nan 0.0036865235306322575 0.0 0.0
dz, dxy, eta, pt  nan 0.009384765289723873 0.0 0.0
dz, dxy, eta, pt  nan 0.07921875268220901 0.0 0.0
dz, dxy, eta, pt  nan -0.0006768798921257257 0.0 0.0

after the change

dz, dxy, eta, pt  5.226568222045898 -0.06972656399011612 0.0 0.0
dz, dxy, eta, pt  5.175044059753418 -0.014941406436264515 0.0 0.0
dz, dxy, eta, pt  2.9320716857910156 0.08945312350988388 0.0 0.0
dz, dxy, eta, pt  12.334485054016113 -0.0037353516090661287 0.0 0.0
dz, dxy, eta, pt  16.324251174926758 -0.022695312276482582 0.0 0.0
dz, dxy, eta, pt  -0.19352594017982483 0.0036865235306322575 0.0 0.0
dz, dxy, eta, pt  3.242164134979248 0.009384765289723873 0.0 0.0
dz, dxy, eta, pt  1.036512017250061 0.07921875268220901 0.0 0.0
dz, dxy, eta, pt  3.9245831966400146 -0.0006768798921257257 0.0 0.0

swagata87 avatar May 19 '24 15:05 swagata87