cmssw Memory usage in AlCaLumiPixelsCounts jobs for run 382300

Tier-0 reports several jobs with high memory usage in run 382300. One example that reproduces is

/afs/cern.ch/user/c/cmst0/public/PausedJobs/Run2024F/AlCaHarvest/job_863561/02ce5b03-cdf6-4215-95c4-e4b3ef3ed8c1-0-1-logArchive.tar.gz

which goes to 3+ GB of RSS very quickly (eg, the start of event processing) and peaks around 6 GB.

This is writing 3 output files with (iiuc) a total of about 200 MB per lumi section and no event data

Jun 25 '24 17:06 davidlange6

cms-bot internal usage

Jun 25 '24 17:06 cmsbuild

A new Issue was created by @davidlange6.

@antoniovilela, @Dr15Jones, @sextonkennedy, @smuzaffar, @makortel, @rappoccio can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

Jun 25 '24 17:06 cmsbuild

assign alca

Jun 25 '24 17:06 Dr15Jones

New categories assigned: alca

@saumyaphor4252,@perrotta,@consuegs you have been requested to review this Pull request/Issue and eventually sign? Thanks

Jun 25 '24 17:06 cmsbuild

i'm feeling confused - is this application doing more than copying out parts of the lumiblock information into a new edm file? [eg, removing the TriggerResults event products and some of the lumiblock products]?

Eg, the outputs appear to share common lumiproducts and are basically the same size as the input.. For example output file copies out

*Br    7 :recoPixelClusterCounts_alcaPCCIntegratorZeroBias_alcaPCCZeroBias_RECO.obj : *
*         | reco::PixelClusterCounts                                         *
*Entries :        3 : Total  Size= 1894371109 bytes  File Size  =  296267534 *
*Baskets :        3 : Basket Size=    4693387 bytes  Compression=   6.39     *
*............................................................................*
*Br    8 :recoPixelClusterCounts_alcaPCCIntegratorZeroBias_alcaPCCZeroBias_RECO.present : *
*         | Bool_t                                                           *
*Entries :        3 : Total  Size=       1248 bytes  File Size  =        471 *
*Baskets :        3 : Basket Size=       9386 bytes  Compression=   1.00     *
*............................................................................*

Jun 26 '24 10:06 davidlange6

@davidlange6 Just to reiterate what @davidlange6 found, when we read back the LuminosityBlock, the reco::PixelClusterCounts object stored in the lumi requires on average 1.9GB/ 3 (averaging the total in memory size reported by ROOT by the 3 lumis in the file) so > 600MB. At the file boundary, the framework doesn't know if the new file being read contains more of the same LuminosityBlock from the last file we read so the framework has all the LuminosityBlock products from the last file in memory at the same time as it is reading the data products from the LuminosityBlock from the new file. So it needs ~ 1.2 GB or so for this.

If the reco::PixelClusterCounts for the different LuminosityBlocks are not roughly the same size (say one is 2x bigger than the others) then the memory requirements can get even worse.

It seems like reco::PixelClusterCounts is holder data PER EVENT which scales poorly as the number of events in a LuminosityBlock increases.

Jun 26 '24 13:06 Dr15Jones

@Dr15Jones - i do not think there is any per event data there. PixelClusterCounts is effectively holding two 2d histograms (hits per bx per roc/module) and and a 1D histogram (of events per bx).

Jun 26 '24 14:06 davidlange6

@Dr15Jones @davidlange6 Dear all, BRIL RC is here. David is correct, we don't store per-event data, because for luminosity we are interested only in "effective" rates for every bx which we can later rescale to the luminosity. The biggest change compared to the previous release is the per roc data which increased the event and LS size. The data is extremely useful for precision luminosity measurement. We can try to remove some modules or update the thresholds to decrease the event size. But it would be helpful if you could provide us with some realistic "target" Tier-0 could tolerate.

Jun 26 '24 14:06 duff-ae

Ok, so we've understood what is new and creating problems.

Why is it useful to split the data from the input file into three pieces (eg, a data per lumi product)? Or do I miss some other functionality happening in this process?

Jun 26 '24 14:06 davidlange6

I made a trivial 'auditing' analyzer or PixelClusterCounting and had it dump information each Lumi. For the files in question, the dumps were relatively consistent with values like

%MSG-s PixelClusterCountsAudit:  PixelClusterCountsAuditor:audit@beginLumi  26-Jun-2024 16:47:23 CEST Run: 382300 Lumi: 17
Branch: recoPixelClusterCounts_alcaPCCIntegratorRandom_alcaPCCRandom_RECO.
 readCounts: 6400944
 readRocCounts: 151398720
 readEvents: 3564
 readModID: 1796
 readRocID: 42480
%MSG
%MSG-s PixelClusterCountsAudit:  PixelClusterCountsAuditor:audit@beginLumi  26-Jun-2024 16:47:23 CEST Run: 382300 Lumi: 17
Branch: recoPixelClusterCounts_alcaPCCIntegratorZeroBias_alcaPCCZeroBias_RECO.
 readCounts: 6400944
 readRocCounts: 151423668
 readEvents: 3564
 readModID: 1796
 readRocID: 42487
%MSG

give the values are int which are 4 bytes in size, that is ~600MB for each readRocCounts.

Jun 26 '24 14:06 Dr15Jones

@davidlange6 David, maybe I am missing something, what is the third file? I thought there were 2 files: for Zero-bias and Random data. I don't understand why they are the same.

Jun 26 '24 15:06 duff-ae

Maybe the third thing Is different, I did not check. I mean

process.ALCARECOStreamAlCaPCCRandomOutPath, process.ALCARECOStreamAlCaPCCZeroBiasOutPath, process.ALCARECOStreamRawPCCProducerOutPath

Ah - the output of ALCARECOStreamRawPCCProducerOutPath is indeed small (2% of the others)

Jun 26 '24 15:06 davidlange6

@davidlange6 We have identified a few possible solutions to reduce the number of entries and will try to implement them as soon as possible. However, I have two questions:

What should be the target rate reduction factor to safely operate Tier0?
How much time do we realistically have to implement this fix?

I understand the urgency of finding a solution, but we want to avoid making any physically unmotivated cuts. Sorry for any inconveniences caused.

Jun 27 '24 09:06 duff-ae

what difference would a rate change make? These objects are presumably roughly the same size regardless of having 0.1Hz or 2000 Hz, no?

As I asked above, do we need the processing step at all? (maybe something to discuss with all groups on Monday's joint ops meeting)

Jun 27 '24 09:06 davidlange6

apologies for the confusion, I didn't mean trigger rates. I meant we could mask, for instance, some of the BPix inner-most layers which might be less useful for us, and it can already decrease object size multiply. Or adjust the threshold to cut some potentially noisy pixels, and so on. But it would be really helpful if you could have some estimations on the reduction factor for the object (2 times? 10?)

Jun 27 '24 09:06 duff-ae

Not so much for me to answer - but nominally this workflow should run in 2GB and currently takes ~6.

Jun 27 '24 10:06 davidlange6

Personally I’d say this data product should take less than 100MB (that would be 25M entries in the vector) and preferably closer to 10MB.

Jun 27 '24 12:06 Dr15Jones

I've prepared a fix that should reduce readRocCounts by 3564 (effectively removing per bx granularity). readCounts will remain unchanged. PR: https://github.com/cms-sw/cmssw/pull/45348

Jun 30 '24 16:06 duff-ae

The unmerged files that were the input of the original job will be removed by the usual Tier0 workflow. I copied them to this location so they can be used for testing the fix:

/eos/user/c/cmst0/public/PausedJobs/Run2024F/AlCaHarvest/input

Jul 02 '24 09:07 germanfgv

Dear all, the patch went to the CMSSW_14_0_11 release. Once it is tested at T0, please let us know if it resolves the issue.

Jul 09 '24 11:07 duff-ae

@cms-sw/alca-l2 Since https://github.com/cms-sw/cmssw/pull/45348 and https://github.com/cms-sw/cmssw/pull/45369 have been merged (long ago), I guess we could close this issue?

Aug 07 '24 13:08 makortel

Kindly ping @cms-sw/alca-l2 to sign and close the issue. Thanks.

Sep 11 '24 20:09 srimanob

+alca

Sep 12 '24 05:09 perrotta

This issue is fully signed and ready to be closed.

Sep 12 '24 05:09 cmsbuild

@cmsbuild, please close

Sep 12 '24 13:09 makortel