cmssw icon indicating copy to clipboard operation
cmssw copied to clipboard

L1TMuonEndCapTrackProducer::produce() takes 96 MB memory per stream

Open makortel opened this issue 2 years ago • 13 comments

Live memory profiles of https://github.com/cms-sw/cmssw/issues/40437#issuecomment-1630699980 show that L1TMuonEndCapTrackProducer::produce() takes 100 MB memory / stream

  • 1 thread and stream https://mkortela.web.cern.ch/mkortela/cgi-bin/navigator/issue40437/reco_07.10_live/240
  • 4 threads and streams https://mkortela.web.cern.ch/mkortela/cgi-bin/navigator/issue40437/reco_4th_4st_07.10_live/101

The L1TMuonEndCapTrackProducer module itself is an edm::stream. The memory consumption can be split into

Assuming the PtAssignmentEngine::load() is indeed BDT or similar, does its representation really need to be that large? Ideally all of these would be in GlobalCache of the module (e.g. Tensorflow stuff), or in the EventSetup (e.g. the BDT whose content apparently depends on the L1TMuonEndCapParams and L1TMuonEndCapForest EventSetup data products).

makortel avatar Aug 09 '23 16:08 makortel

assign l1

makortel avatar Aug 09 '23 16:08 makortel

New categories assigned: l1

@epalencia,@aloeliger you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild avatar Aug 09 '23 16:08 cmsbuild

A new Issue was created by @makortel Matti Kortelainen.

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

cmsbuild avatar Aug 09 '23 16:08 cmsbuild

Moving the Tensorflow stuff to GlobalCache was also discussed in https://github.com/cms-sw/cmssw/issues/32894

makortel avatar Aug 09 '23 16:08 makortel

Hi @makortel , as you said the main contribution here is loading of the BDT in PtAssignmentEngine::load and coordinate conversion LUTs in SectorProcessorLUT::read. I don't know how we can reduce these easily.

Regarding the Tensorflow stuff and GlobalCache, I currently don't have the time to rework the code unfortunately. If you and/or L1T offline software think that this should be done, I can try to pass this task to someone within the EMTF group and we can see how quickly we can implement this.

eyigitba avatar Aug 09 '23 17:08 eyigitba

Hi @eyigitba I think addressing the PtAssignmentEngine::load() (at least making read-only parts shared across streams; I hope the memory there is mostly read-only) would be important.

The Tensorflow part would be nice (e.g. if the model would become larger in the future, or serving as an example for others), but with 0.5 MB / stream not that important today.

makortel avatar Aug 09 '23 17:08 makortel

Hi @makortel , ok we'll look into the BDT loading and also see how we can improve the Tensorflow part.

How urgent is this btw?

eyigitba avatar Aug 09 '23 18:08 eyigitba

Memory budget is 2GB per stream (including shared component and I/O buffers). L1TMuonEndCapTrackProducer alone accounts for 5% of that. To L1 mngmt to judge how urgent it WAS.

VinInn avatar Aug 10 '23 09:08 VinInn

BTW: why using your own implementation of a Forest and not the highly optimized common CMS one https://cmssdt.cern.ch/dxr/CMSSW/source/CondFormats/GBRForest/interface/GBRForest.h#24 ?

I would advice to switch to that.

VinInn avatar Aug 10 '23 09:08 VinInn

Thanks for the advice @VinInn . I don't know why it was implemented like this, but this code is quite old, from 2016 or so. I unfortunately don't have much time these couple of weeks, but we'll discuss in the EMTF group to come up with a solution soon.

eyigitba avatar Aug 10 '23 10:08 eyigitba

The PtAssignmentEngine::load() came up again in https://github.com/cms-sw/cmssw/issues/46040#issuecomment-2420384665, this time with ~47 MB / stream cost. On an 8-thread/stream PromptReco that corresponds to 376 MB. Avoiding the per-stream copies could potentially save ~329 MB, and moving to GBRForest would likely save more.

In my opinion it would be great if this would be addressed for 2025 data taking.

makortel avatar Oct 17 '24 20:10 makortel

cms-bot internal usage

cmsbuild avatar Oct 17 '24 20:10 cmsbuild

type performance-improvements

makortel avatar Oct 17 '24 20:10 makortel

@eyigitba @cms-sw/l1-l2 Any news?

makortel avatar Dec 09 '24 14:12 makortel

@makortel , not yet from my side. I have a student looking into this but nothing to discuss for now

eyigitba avatar Dec 11 '24 10:12 eyigitba

Came up in https://github.com/cms-sw/cmssw/issues/46975#issuecomment-2557671051

makortel avatar Dec 20 '24 20:12 makortel

@eyigitba @cms-sw/l1-l2 A kind reminder (this came up last yer when we investigated PromptReco jobs using too much memory)

makortel avatar Mar 28 '25 13:03 makortel

Hi @makortel , we have someone that'll look into this but they are currently busy with another task. I hope they'll start the work on this soon

eyigitba avatar Mar 28 '25 14:03 eyigitba

Great!

makortel avatar Mar 28 '25 14:03 makortel