cmssw icon indicating copy to clipboard operation
cmssw copied to clipboard

[Run3 PromptReco] TMVA in `BaseMVAValueMapProducer<pat::{Electron,Muon}>` uses 11 MB memory per stream

Open makortel opened this issue 1 year ago • 8 comments

From https://github.com/cms-sw/cmssw/issues/46040#issuecomment-2420384665

The function reco::details::loadTMVAWeights() uses 11 MB memory per stream (1 thread/stream profile vs 4 thread/stream profile), corresponding to 87 MB in 8-thread PromptReco job.

The function is called by BaseMVAValueMapProducer<pat::Electron> and BaseMVAValueMapProducer<pat::Muon> constructors.

Avoiding the replication would reduce PromptReco memory by 76 MB. I don't know if TMVA is thread safe (i.e. if it could be easily moved to a edm::GlobalCache). If it is not, then probably some other inference engine should be considered.

makortel avatar Oct 18 '24 18:10 makortel

cms-bot internal usage

cmsbuild avatar Oct 18 '24 18:10 cmsbuild

A new Issue was created by @makortel.

@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

cmsbuild avatar Oct 18 '24 18:10 cmsbuild

assign PhysicsTools/PatAlgos

makortel avatar Oct 18 '24 18:10 makortel

New categories assigned: reconstruction,xpog

@ftorrresd,@hqucms,@jfernan2,@mandrenguyen you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild avatar Oct 18 '24 18:10 cmsbuild

assign CommonTools/MVAUtils

makortel avatar Oct 18 '24 18:10 makortel

assign ml

makortel avatar Oct 18 '24 18:10 makortel

New categories assigned: ml

@valsdav,@y19y19 you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild avatar Oct 18 '24 18:10 cmsbuild

type performance-improvements

makortel avatar Oct 18 '24 18:10 makortel

TMVA::Reader is not thread safe. There is a new interface called RBDT. https://root-forum.cern.ch/t/rdataframe-multithreading-loses-events/49338/9

I can check if we can convert these models without changes.

valsdav avatar Oct 24 '24 09:10 valsdav

The thread had a further comment https://root-forum.cern.ch/t/rdataframe-multithreading-loses-events/49338/11

Actually looking at the code, I see that there should be a lock guard in the RReader::Compute for protecting multiple model evaluations. I will look into this why it is not thread safe.

Do you know if this was resolved? (we want to avoid locks)

makortel avatar Oct 24 '24 12:10 makortel