Memory usage in AlCaLumiPixelsCounts jobs for run 382300
Tier-0 reports several jobs with high memory usage in run 382300. One example that reproduces is
/afs/cern.ch/user/c/cmst0/public/PausedJobs/Run2024F/AlCaHarvest/job_863561/02ce5b03-cdf6-4215-95c4-e4b3ef3ed8c1-0-1-logArchive.tar.gz
which goes to 3+ GB of RSS very quickly (eg, the start of event processing) and peaks around 6 GB.
This is writing 3 output files with (iiuc) a total of about 200 MB per lumi section and no event data
cms-bot internal usage
A new Issue was created by @davidlange6.
@antoniovilela, @Dr15Jones, @sextonkennedy, @smuzaffar, @makortel, @rappoccio can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
assign alca
New categories assigned: alca
@saumyaphor4252,@perrotta,@consuegs you have been requested to review this Pull request/Issue and eventually sign? Thanks
i'm feeling confused - is this application doing more than copying out parts of the lumiblock information into a new edm file? [eg, removing the TriggerResults event products and some of the lumiblock products]?
Eg, the outputs appear to share common lumiproducts and are basically the same size as the input.. For example output file copies out
*Br 7 :recoPixelClusterCounts_alcaPCCIntegratorZeroBias_alcaPCCZeroBias_RECO.obj : *
* | reco::PixelClusterCounts *
*Entries : 3 : Total Size= 1894371109 bytes File Size = 296267534 *
*Baskets : 3 : Basket Size= 4693387 bytes Compression= 6.39 *
*............................................................................*
*Br 8 :recoPixelClusterCounts_alcaPCCIntegratorZeroBias_alcaPCCZeroBias_RECO.present : *
* | Bool_t *
*Entries : 3 : Total Size= 1248 bytes File Size = 471 *
*Baskets : 3 : Basket Size= 9386 bytes Compression= 1.00 *
*............................................................................*
@davidlange6 Just to reiterate what @davidlange6 found, when we read back the LuminosityBlock, the reco::PixelClusterCounts object stored in the lumi requires on average 1.9GB/ 3 (averaging the total in memory size reported by ROOT by the 3 lumis in the file) so > 600MB. At the file boundary, the framework doesn't know if the new file being read contains more of the same LuminosityBlock from the last file we read so the framework has all the LuminosityBlock products from the last file in memory at the same time as it is reading the data products from the LuminosityBlock from the new file. So it needs ~ 1.2 GB or so for this.
If the reco::PixelClusterCounts for the different LuminosityBlocks are not roughly the same size (say one is 2x bigger than the others) then the memory requirements can get even worse.
It seems like reco::PixelClusterCounts is holder data PER EVENT which scales poorly as the number of events in a LuminosityBlock increases.
@Dr15Jones - i do not think there is any per event data there. PixelClusterCounts is effectively holding two 2d histograms (hits per bx per roc/module) and and a 1D histogram (of events per bx).
@Dr15Jones @davidlange6 Dear all, BRIL RC is here. David is correct, we don't store per-event data, because for luminosity we are interested only in "effective" rates for every bx which we can later rescale to the luminosity. The biggest change compared to the previous release is the per roc data which increased the event and LS size. The data is extremely useful for precision luminosity measurement. We can try to remove some modules or update the thresholds to decrease the event size. But it would be helpful if you could provide us with some realistic "target" Tier-0 could tolerate.
Ok, so we've understood what is new and creating problems.
Why is it useful to split the data from the input file into three pieces (eg, a data per lumi product)? Or do I miss some other functionality happening in this process?
I made a trivial 'auditing' analyzer or PixelClusterCounting and had it dump information each Lumi. For the files in question, the dumps were relatively consistent with values like
%MSG-s PixelClusterCountsAudit: PixelClusterCountsAuditor:audit@beginLumi 26-Jun-2024 16:47:23 CEST Run: 382300 Lumi: 17
Branch: recoPixelClusterCounts_alcaPCCIntegratorRandom_alcaPCCRandom_RECO.
readCounts: 6400944
readRocCounts: 151398720
readEvents: 3564
readModID: 1796
readRocID: 42480
%MSG
%MSG-s PixelClusterCountsAudit: PixelClusterCountsAuditor:audit@beginLumi 26-Jun-2024 16:47:23 CEST Run: 382300 Lumi: 17
Branch: recoPixelClusterCounts_alcaPCCIntegratorZeroBias_alcaPCCZeroBias_RECO.
readCounts: 6400944
readRocCounts: 151423668
readEvents: 3564
readModID: 1796
readRocID: 42487
%MSG
give the values are int which are 4 bytes in size, that is ~600MB for each readRocCounts.
@davidlange6 David, maybe I am missing something, what is the third file? I thought there were 2 files: for Zero-bias and Random data. I don't understand why they are the same.
Maybe the third thing Is different, I did not check. I mean
process.ALCARECOStreamAlCaPCCRandomOutPath, process.ALCARECOStreamAlCaPCCZeroBiasOutPath, process.ALCARECOStreamRawPCCProducerOutPath
Ah - the output of ALCARECOStreamRawPCCProducerOutPath is indeed small (2% of the others)
@davidlange6 We have identified a few possible solutions to reduce the number of entries and will try to implement them as soon as possible. However, I have two questions:
-
What should be the target rate reduction factor to safely operate Tier0?
-
How much time do we realistically have to implement this fix?
I understand the urgency of finding a solution, but we want to avoid making any physically unmotivated cuts. Sorry for any inconveniences caused.
what difference would a rate change make? These objects are presumably roughly the same size regardless of having 0.1Hz or 2000 Hz, no?
As I asked above, do we need the processing step at all? (maybe something to discuss with all groups on Monday's joint ops meeting)
apologies for the confusion, I didn't mean trigger rates. I meant we could mask, for instance, some of the BPix inner-most layers which might be less useful for us, and it can already decrease object size multiply. Or adjust the threshold to cut some potentially noisy pixels, and so on. But it would be really helpful if you could have some estimations on the reduction factor for the object (2 times? 10?)
Not so much for me to answer - but nominally this workflow should run in 2GB and currently takes ~6.
Personally I’d say this data product should take less than 100MB (that would be 25M entries in the vector) and preferably closer to 10MB.
I've prepared a fix that should reduce readRocCounts by 3564 (effectively removing per bx granularity). readCounts will remain unchanged. PR: https://github.com/cms-sw/cmssw/pull/45348
The unmerged files that were the input of the original job will be removed by the usual Tier0 workflow. I copied them to this location so they can be used for testing the fix:
/eos/user/c/cmst0/public/PausedJobs/Run2024F/AlCaHarvest/input
Dear all, the patch went to the CMSSW_14_0_11 release. Once it is tested at T0, please let us know if it resolves the issue.
@cms-sw/alca-l2 Since https://github.com/cms-sw/cmssw/pull/45348 and https://github.com/cms-sw/cmssw/pull/45369 have been merged (long ago), I guess we could close this issue?
Kindly ping @cms-sw/alca-l2 to sign and close the issue. Thanks.
+alca
This issue is fully signed and ready to be closed.
@cmsbuild, please close