cmssw icon indicating copy to clipboard operation
cmssw copied to clipboard

Reproducibilty issue concerning `BeamSpotOnlineProducer`

Open mmusich opened this issue 6 months ago • 21 comments

While investigating something different I came across a weird reproducibility issue concerning BeamSpotOnlineProducer, when crossing different files in input. By applying the following patch:

diff --git a/RecoVertex/BeamSpotProducer/plugins/BeamSpotOnlineProducer.cc b/RecoVertex/BeamSpotProducer/plugins/BeamSpotOnlineProducer.cc
index ab070e86606..ba98a91b0c0 100644
--- a/RecoVertex/BeamSpotProducer/plugins/BeamSpotOnlineProducer.cc
+++ b/RecoVertex/BeamSpotProducer/plugins/BeamSpotOnlineProducer.cc
@@ -145,6 +145,7 @@ void BeamSpotOnlineProducer::produce(edm::Event& iEvent, const edm::EventSetup&
   }
 
   std::unique_ptr<reco::BeamSpot> toput = std::make_unique<reco::BeamSpot>(result);
+  edm::LogPrint("BeamSpotOnlineProducer") << "BS from BeamSpotOnlineProducer:   \n" << *toput;
   iEvent.put(std::move(toput));
 }
 

and using the script attached below [1], I tried running the following:

A)

cmsRun test_bs_online_cfg.py inputFiles=/store/express/Run2025B/StreamHLTMonitor/ALCARECO/TkAlHLTTracks-Express-v1/000/391/825/00000/dbf59d59-e4d6-4480-83b2-6ca7c8d8e6e0.root maxEvents=1

and obtained the following printout:

10-Jun-2025 10:14:04 CEST  Successfully opened file root://eoscms.cern.ch//eos/cms/store/express/Run2025B/StreamHLTMonitor/ALCARECO/TkAlHLTTracks-Express-v1/000/391/825/00000/dbf59d59-e4d6-4480-83b2-6ca7c8d8e6e0.root
%MSG-w BeamSpotOnlineProducer:   BeamSpotOnlineProducer:MyBeamSpot@streamBeginLumi 10-Jun-2025 10:14:04 CEST  Run: 391825 Lumi: 96 Stream: 0
The beam spot record is too old. (record: BeamSpotOnlineLegacyObjectsRcd) record creation time: 1746525190 lumi block time: 1746758172
%MSG
%MSG-w BeamSpotOnlineProducer:   BeamSpotOnlineProducer:MyBeamSpot@streamBeginLumi 10-Jun-2025 10:14:04 CEST  Run: 391825 Lumi: 96 Stream: 0
The beam spot record is too old. (record: BeamSpotOnlineHLTObjectsRcd) record creation time: 1746525184 lumi block time: 1746758172
%MSG
%MSG-w BeamSpotOnlineProducer:   BeamSpotOnlineProducer:MyBeamSpot@streamBeginLumi 10-Jun-2025 10:14:04 CEST  Run: 391825 Lumi: 96 Stream: 0
None of the online records holds a valid beam spot. The Online Beam Spot producer falls back to the PCL value.
%MSG
%MSG-w BeamSpotOnlineProducer:   BeamSpotOnlineProducer:MyBeamSpot@streamBeginLumi 10-Jun-2025 10:14:04 CEST  Run: 391825 Lumi: 97 Stream: 0
The beam spot record is too old. (record: BeamSpotOnlineLegacyObjectsRcd) record creation time: 1746525190 lumi block time: 1746758188
%MSG
%MSG-w BeamSpotOnlineProducer:   BeamSpotOnlineProducer:MyBeamSpot@streamBeginLumi 10-Jun-2025 10:14:04 CEST  Run: 391825 Lumi: 97 Stream: 0
The beam spot record is too old. (record: BeamSpotOnlineHLTObjectsRcd) record creation time: 1746525184 lumi block time: 1746758188
%MSG
%MSG-w BeamSpotOnlineProducer:   BeamSpotOnlineProducer:MyBeamSpot@streamBeginLumi 10-Jun-2025 10:14:04 CEST  Run: 391825 Lumi: 97 Stream: 0
None of the online records holds a valid beam spot. The Online Beam Spot producer falls back to the PCL value.
%MSG
Begin processing the 1st record. Run 391825, Event 111108632, LumiSection 97 on stream 0 at 10-Jun-2025 10:14:04.471 CEST
BS from BeamSpotOnlineProducer:   
-----------------------------------------------------
              Beam Spot Data

 Beam type    = 2
       X0     = 0.088783 +/- 1.56604e-06 [cm]
       Y0     = -0.189512 +/- 1.55428e-06 [cm]
       Z0     = -0.242728 +/- 0.0108045 [cm]
 Sigma Z0     = 3.91093 +/- 0.00763864 [cm]
 dxdz         = 0.000122072 +/- 3.90099e-07 [radians]
 dydz         = 2.88328e-05 +/- 3.87404e-07 [radians]
 Beam width X = 0.000818877 +/- 3.56881e-06 [cm]
 Beam width Y = 0.000926792 +/- 3.56881e-06 [cm]
 EmittanceX   = 0 [cm]
 EmittanceY   = 0 [cm]
 beta-star    = 0 [cm]
-----------------------------------------------------

B)

when using another file before the one containing Run 391825, Event 111108632, LumiSection 97 with:

cmsRun test_bs_online_cfg.py inputFiles=/store/express/Run2025B/StreamHLTMonitor/ALCARECO/TkAlHLTTracks-Express-v1/000/391/825/00000/7bcff739-9522-4118-8df2-0194276ddef3.root,/store/express/Run2025B/StreamHLTMonitor/ALCARECO/TkAlHLTTracks-Express-v1/000/391/825/00000/dbf59d59-e4d6-4480-83b2-6ca7c8d8e6e0.root maxEvents=5 eventsToSkip=50

I get:

10-Jun-2025 10:15:42 CEST  Successfully opened file root://eoscms.cern.ch//eos/cms/store/express/Run2025B/StreamHLTMonitor/ALCARECO/TkAlHLTTracks-Express-v1/000/391/825/00000/dbf59d59-e4d6-4480-83b2-6ca7c8d8e6e0.root
%MSG-w BeamSpotOnlineProducer:   BeamSpotOnlineProducer:MyBeamSpot@streamBeginLumi 10-Jun-2025 10:15:42 CEST  Run: 391825 Lumi: 96 Stream: 0
The beam spot record is too old. (record: BeamSpotOnlineLegacyObjectsRcd) record creation time: 1746525190 lumi block time: 1746758172
%MSG
%MSG-w BeamSpotOnlineProducer:   BeamSpotOnlineProducer:MyBeamSpot@streamBeginLumi 10-Jun-2025 10:15:42 CEST  Run: 391825 Lumi: 96 Stream: 0
The beam spot record is too old. (record: BeamSpotOnlineHLTObjectsRcd) record creation time: 1746525184 lumi block time: 1746758172
%MSG
%MSG-w BeamSpotOnlineProducer:   BeamSpotOnlineProducer:MyBeamSpot@streamBeginLumi 10-Jun-2025 10:15:42 CEST  Run: 391825 Lumi: 96 Stream: 0
None of the online records holds a valid beam spot. The Online Beam Spot producer falls back to the PCL value.
%MSG
%MSG-w BeamSpotOnlineProducer:   BeamSpotOnlineProducer:MyBeamSpot@streamBeginLumi 10-Jun-2025 10:15:42 CEST  Run: 391825 Lumi: 97 Stream: 0
The beam spot record is too old. (record: BeamSpotOnlineLegacyObjectsRcd) record creation time: 1746525190 lumi block time: 1746758188
%MSG
%MSG-w BeamSpotOnlineProducer:   BeamSpotOnlineProducer:MyBeamSpot@streamBeginLumi 10-Jun-2025 10:15:42 CEST  Run: 391825 Lumi: 97 Stream: 0
The beam spot record is too old. (record: BeamSpotOnlineHLTObjectsRcd) record creation time: 1746525184 lumi block time: 1746758188
%MSG
%MSG-w BeamSpotOnlineProducer:   BeamSpotOnlineProducer:MyBeamSpot@streamBeginLumi 10-Jun-2025 10:15:42 CEST  Run: 391825 Lumi: 97 Stream: 0
None of the online records holds a valid beam spot. The Online Beam Spot producer falls back to the PCL value.
%MSG
Begin processing the 5th record. Run 391825, Event 111108632, LumiSection 97 on stream 0 at 10-Jun-2025 10:15:42.169 CEST
BS from BeamSpotOnlineProducer:   
-----------------------------------------------------
              Beam Spot Data

 Beam type    = 2
       X0     = 0.0869003 +/- 3.83505e-05 [cm]
       Y0     = -0.190521 +/- 3.98047e-05 [cm]
       Z0     = -0.135542 +/- 0.0433641 [cm]
 Sigma Z0     = 3.52958 +/- 0.030663 [cm]
 dxdz         = 0.000131853 +/- 1.02839e-05 [radians]
 dydz         = 8.04638e-06 +/- 1.06301e-05 [radians]
 Beam width X = 0.00195148 +/- 6.10571e-05 [cm]
 Beam width Y = 0.00200205 +/- 6.10571e-05 [cm]
 EmittanceX   = 0 [cm]
 EmittanceY   = 0 [cm]
 beta-star    = 0 [cm]
-----------------------------------------------------

so a different payload. Considering that the producer (as the printout states) should fall-back on the PCL tag here:

https://github.com/cms-sw/cmssw/blob/d4e85d2e036f6fa32bab1076bd1c70511db2bc23/RecoVertex/BeamSpotProducer/plugins/BeamSpotOnlineProducer.cc#L143-L145

and

https://github.com/cms-sw/cmssw/blob/d4e85d2e036f6fa32bab1076bd1c70511db2bc23/RecoVertex/BeamSpotProducer/plugins/BeamSpotOnlineProducer.cc#L316-L340

in this case the tag name for the record BeamSpotObjectsRcd is BeamSpotObjects_PCL_byRun_v0_hlt and the payload that should be fetched is

391706       2025-05-06 10:33:04  2394d64fdc919f91bbcb985ffb5364addf296888  BeamSpotObjects   
391728       2025-05-06 20:41:16  daa5aad8f36ae4c3cf7cd428917e1da532e725b7  BeamSpotObjects     <====
391840       2025-05-09 09:25:56  033bb6c95a857944d857d46523f463229b625280  BeamSpotObjects   

and by dumping it:

$ conddb dump daa5aad8f36ae4c3cf7cd428917e1da532e725b7
[2025-06-10 10:33:51,739] INFO: Connecting to pro [frontier://PromptProd/cms_conditions]
[2025-06-10 10:33:51,817] INFO: Found payload of type BeamSpotObjects
[2025-06-10 10:33:52,285] INFO: XML converter for payload class BeamSpotObjects found in the built-in library.
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE boost_serialization>
<boost_serialization signature="serialization::archive" version="19">
<cmsCondPayload class_id="0" tracking_level="0" version="0">
	<position->
		<count>3</count>
		<item>8.87829601901142795e-02</item>
		<item>-1.89511769048909984e-01</item>
		<item>-2.42727502509693688e-01</item>
	</position->
...

the correct payload is fetched for case A) but not for case B).

[1]

Script to test

Save as test_bs_online_cfg.py

import FWCore.ParameterSet.Config as cms

import FWCore.ParameterSet.VarParsing as VarParsing
options = VarParsing.VarParsing('analysis')

options.register('globalTag',
                 "150X_dataRun3_HLT_v1", #default value
                 VarParsing.VarParsing.multiplicity.singleton,
                 VarParsing.VarParsing.varType.string,
                 "GlobalTag to use.")

options.register('eventsToSkip',
                 0, #default value
                 VarParsing.VarParsing.multiplicity.singleton,
                 VarParsing.VarParsing.varType.int,
                 "events to skep.")

options.parseArguments()

from Configuration.StandardSequences.Eras import eras
process = cms.Process("test",eras.Run3_2025)

process.source = cms.Source("PoolSource",
                            fileNames = cms.untracked.vstring(options.inputFiles),
                            skipEvents = cms.untracked.uint32(options.eventsToSkip)
                            )

process.maxEvents = cms.untracked.PSet(input = cms.untracked.int32(options.maxEvents))

####################################################################
# load GlobalTag
####################################################################
process.load("Configuration.StandardSequences.FrontierConditions_GlobalTag_cff")
from Configuration.AlCa.GlobalTag import GlobalTag
process.GlobalTag = GlobalTag(process.GlobalTag, options.globalTag)

####################################################################
# load beamspot
####################################################################
from RecoVertex.BeamSpotProducer.BeamSpotOnline_cfi import onlineBeamSpotProducer as _onlineBeamSpotProducer
process.MyBeamSpot = _onlineBeamSpotProducer.clone()

####################################################################
# Path
####################################################################
process.p = cms.Path(process.MyBeamSpot)

mmusich avatar Jun 10 '25 08:06 mmusich

cms-bot internal usage

cmsbuild avatar Jun 10 '25 08:06 cmsbuild

A new Issue was created by @mmusich.

@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

cmsbuild avatar Jun 10 '25 08:06 cmsbuild

assign RecoVertex/BeamSpotProducer

Dr15Jones avatar Jun 10 '25 13:06 Dr15Jones

assign core

Dr15Jones avatar Jun 10 '25 13:06 Dr15Jones

New categories assigned: alca,reconstruction,core

@atpathak,@Dr15Jones,@jfernan2,@makortel,@mandrenguyen,@perrotta,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild avatar Jun 10 '25 13:06 cmsbuild

Did you edit some of the log file? I ask since the first log has

Begin processing the 1st record. Run 391825, Event 111108632, LumiSection 97 on stream 0 at 10-Jun-2025 10:14:04.471 CEST

while the second log only has

Begin processing the 5th record. Run 391825, Event 111108632, LumiSection 97 on stream 0 at 10-Jun-2025 10:15:42.169 CEST

Dr15Jones avatar Jun 10 '25 13:06 Dr15Jones

Did you edit some of the log file? I ask since the first log has

yes, I removed the first 4 events (that are only run in the case B) in order to compare only the printout for Run 391825, Event 111108632, LumiSection 97.

mmusich avatar Jun 10 '25 13:06 mmusich

Along with printing out the context of the data product, could you also print the IOVSync value of the BeamSpotObjectsRcd record? That value is what should correspond to which data base payload should be acquired.

Dr15Jones avatar Jun 10 '25 13:06 Dr15Jones

~~So in the two jobs, the 1st event of the first job has the same EventID as the 5th event of the second job. Given you are reading the same file, that really makes no sense.~~ (I missed there were 2 files in the second example).

Also, which release are you using?

Dr15Jones avatar Jun 10 '25 15:06 Dr15Jones

So using the most recent CMSSW_15_1 release, I added PrintEventSetupContent module to the job. For the two file job, I see the following BeamSpot related IOV changes

The first file causes

%MSG-s ESContent:  PrintEventSetupContent:print@beginRun  10-Jun-2025 11:16:48 CDT Run: 391825
Changed Record
  <datatype> 'label' provider: 'provider label' <provider module type>
BeamSpotObjectsRcd
 start: run: 391728 lumi: 0 event: 0 time: 0
 end:   run: 391839 lumi: 4294967295 event: 18446744073709551615 time: 0
  BeamSpotObjects '' provider:PoolDBESSource 'GlobalTag'
...
BeamSpotOnlineHLTObjectsRcd
 start: run: 391700 lumi: 193 event: 0 time: 0
 end:   run: 391825 lumi: 100 event: 18446744073709551615 time: 0
  BeamSpotOnlineObjects '' provider:PoolDBESSource 'GlobalTag'
...
BeamSpotOnlineLegacyObjectsRcd
 start: run: 391700 lumi: 193 event: 0 time: 0
 end:   run: 391825 lumi: 102 event: 18446744073709551615 time: 0
  BeamSpotOnlineObjects '' provider:PoolDBESSource 'GlobalTag'
%MSG-s ESContent:  PrintEventSetupContent:print@beginLumi  10-Jun-2025 11:16:48 CDT Run: 391825 Lumi: 296
Changed Record
  <datatype> 'label' provider: 'provider label' <provider module type>
BeamSpotOnlineHLTObjectsRcd
 start: run: 391825 lumi: 293 event: 0 time: 0
 end:   run: 391825 lumi: 297 event: 18446744073709551615 time: 0
  BeamSpotOnlineObjects '' provider:PoolDBESSource 'GlobalTag'
BeamSpotOnlineLegacyObjectsRcd
 start: run: 391825 lumi: 293 event: 0 time: 0
 end:   run: 391825 lumi: 297 event: 18446744073709551615 time: 0
  BeamSpotOnlineObjects '' provider:PoolDBESSource 'GlobalTag'

Then after the 2nd file was opened

%MSG-s ESContent:  PrintEventSetupContent:print@beginLumi  10-Jun-2025 11:16:59 CDT Run: 391825 Lumi: 96
Changed Record
  <datatype> 'label' provider: 'provider label' <provider module type>
BeamSpotOnlineHLTObjectsRcd
 start: run: 391700 lumi: 193 event: 0 time: 0
 end:   run: 391825 lumi: 100 event: 18446744073709551615 time: 0
  BeamSpotOnlineObjects '' provider:PoolDBESSource 'GlobalTag'
BeamSpotOnlineLegacyObjectsRcd
 start: run: 391700 lumi: 193 event: 0 time: 0
 end:   run: 391825 lumi: 102 event: 18446744073709551615 time: 0
  BeamSpotOnlineObjects '' provider:PoolDBESSource 'GlobalTag'

Dr15Jones avatar Jun 10 '25 16:06 Dr15Jones

Running just the 1 file gives only

%MSG-s ESContent:  PrintEventSetupContent:print@beginRun  10-Jun-2025 11:28:17 CDT Run: 391825
Changed Record
  <datatype> 'label' provider: 'provider label' <provider module type>
BeamSpotObjectsRcd
 start: run: 391728 lumi: 0 event: 0 time: 0
 end:   run: 391839 lumi: 4294967295 event: 18446744073709551615 time: 0
  BeamSpotObjects '' provider:PoolDBESSource 'GlobalTag'
BeamSpotOnlineHLTObjectsRcd
 start: run: 391700 lumi: 193 event: 0 time: 0
 end:   run: 391825 lumi: 100 event: 18446744073709551615 time: 0
  BeamSpotOnlineObjects '' provider:PoolDBESSource 'GlobalTag'
BeamSpotOnlineLegacyObjectsRcd
 start: run: 391700 lumi: 193 event: 0 time: 0
 end:   run: 391825 lumi: 102 event: 18446744073709551615 time: 0
  BeamSpotOnlineObjects '' provider:PoolDBESSource 'GlobalTag'

Dr15Jones avatar Jun 10 '25 16:06 Dr15Jones

Also, which release are you using?

CMSSW_15_0_6.

mmusich avatar Jun 10 '25 16:06 mmusich

assign db

Dr15Jones avatar Jun 10 '25 17:06 Dr15Jones

New categories assigned: db

@atpathak,@francescobrivio,@perrotta you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild avatar Jun 10 '25 18:06 cmsbuild

So the configuration for the module is

>>> print(process.MyBeamSpot.dumpPython())
cms.EDProducer("BeamSpotOnlineProducer",
    beamMode = cms.untracked.uint32(11),
    changeToCMSCoordinates = cms.bool(False),
    gtEvmLabel = cms.InputTag("gtEvmDigis"),
    maxRadius = cms.double(2),
    maxZ = cms.double(40),
    mightGet = cms.optional.untracked.vstring,
    setSigmaZ = cms.double(-1),
    sigmaXYThreshold = cms.double(4),
    sigmaZThreshold = cms.double(2),
    src = cms.InputTag("scalersRawToDigi"),
    timeThreshold = cms.int32(48),
    useBSOnlineRecords = cms.bool(True)
)

In order to better see which code path the module is taking, I added message logger statements to each of the routines that initialize the result.

Running the two file job I see

%MSG-e BeamSpot:  BeamSpotOnlineProducer:MyBeamSpot@streamBeginLumi  10-Jun-2025 14:21:18 CDT Run: 391825 Lumi: 296 Stream: 0
calling createBeamSpotFromRecord
%MSG

which is the only call. That function takes either the beam spot from BeamSpotOnlineLegacyObjectsRcd or from BeamSpotOnlineHLTObjectsRcd and will use that one. Then since after that point result.type() == reco::BeamSpot::Tracker; the module will never update the beam spot EVER.

Running the 1 file job I see

%MSG-e BeamSpot:  BeamSpotOnlineProducer:MyBeamSpot 10-Jun-2025 14:25:05 CDT  Run: 391825 Event: 111108632
calling createBeamSpotFromDB
%MSG

That routine gets the beam spot from BeamSpotObjectsRcd.

Given how this module gets different beam spots based on the order of files it sees explains the discrepancy.

Dr15Jones avatar Jun 10 '25 19:06 Dr15Jones

unassign core

Dr15Jones avatar Jun 10 '25 20:06 Dr15Jones

unassign db

Dr15Jones avatar Jun 10 '25 20:06 Dr15Jones

@gennai @francescobrivio @lguzzi FYI.

mmusich avatar Jun 11 '25 04:06 mmusich

Proposed fixes are:

  • https://github.com/cms-sw/cmssw/pull/48291 (master release, CMSSW_15_1_X, merged in time for CMSSW_15_1_0_pre4)
  • https://github.com/cms-sw/cmssw/pull/48310 (CMSSW_15_0_X, will be available for the next data-taking release after CMSSW_15_0_8).

mmusich avatar Jun 18 '25 07:06 mmusich

@cms-sw/reconstruction-l2 @cms-sw/alca-l2 feel free to sign, such that the issue could be closed.

mmusich avatar Jun 18 '25 07:06 mmusich

+1

jfernan2 avatar Jun 18 '25 07:06 jfernan2

as this was solved (see details at https://github.com/cms-sw/cmssw/issues/48280#issuecomment-2982962357) I am going to close this issue. @cms-sw/alca-l2 feel free to sign post-facto.

mmusich avatar Aug 12 '25 09:08 mmusich

+alca

perrotta avatar Aug 13 '25 08:08 perrotta

This issue is fully signed and ready to be closed.

cmsbuild avatar Aug 13 '25 08:08 cmsbuild