cmssw icon indicating copy to clipboard operation
cmssw copied to clipboard

Profiling at T0 AlCa and DQM workflows

Open tvami opened this issue 3 years ago • 24 comments

Follow up to the issue 36282 and cmsTalk https://cms-talk.web.cern.ch/t/high-memory-usage-in-promptreco-jobs-for-run-352516/11040

So the issue is that the wf chosen in github issue 36282 is based on the MET dataset, thus AlCaHcalHBHEMuonProducer is not run on it. (It's attached to the MinBias and SingleMuon).

This is a general issue for testing, certain ALCARECOs belong to certain PDs (as defined in the AlCaRECO matrix). i.e. we either do the testing on several wf, or just decide to pick one that has most of the ALCARECOs connected to it. That would be SingleMuon.

If that's the prefered solution, we can set up a new wf after the Run3 single muon PD is done (next week?)

tvami avatar Jun 02 '22 13:06 tvami

A new Issue was created by @tvami Tamas Vami.

@Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

cmsbuild avatar Jun 02 '22 13:06 cmsbuild

assign alca,reconstruction

tvami avatar Jun 02 '22 13:06 tvami

New categories assigned: dqm,alca

@jfernan2,@ahmad3213,@yuanchao,@micsucmed,@rvenditti,@emanueleusai,@francescobrivio,@malbouis,@tvami,@pmandrik you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild avatar Jun 02 '22 13:06 cmsbuild

New categories assigned: reconstruction

@jpata,@slava77,@clacaputo you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild avatar Jun 02 '22 13:06 cmsbuild

How many events are processed for the plots in http://cms-reco-profiling.web.cern.ch/cms-reco-profiling/results/summary_plot_html/CMSSW_12_4_step3_136.889.html ? (I'm confused of the x axis)

makortel avatar Jun 02 '22 13:06 makortel

It's 5k events: https://github.com/cms-sw/cms-bot/blob/master/reco_profiling/profileRunner.py#L95.

The plot has event IDs on the x axis, we need to change that (cc @xoqhdgh1002) to be just numbers.

jpata avatar Jun 02 '22 14:06 jpata

It's 5k events:

Thanks, good. That should be sufficient for this particular leak (or even smaller) to be visible (this would have been ~1.2 GB after 5k events).

makortel avatar Jun 02 '22 14:06 makortel

@tvami, should we take any action here? Is there a new workflow we should switch to that would be more representative?

jpata avatar Jun 16 '22 12:06 jpata

Hi @jpata we could use a run from this Monday. However, it will likely have limited stats, although if the tests go up to 5000 events that could probably be reached

tvami avatar Jun 16 '22 12:06 tvami

We test about 5k now, so it wouldn't be a big change. Is there a workflow so we can give a try, and you can see if the results are useful for ALCA?

jpata avatar Jun 16 '22 14:06 jpata

@tocheng is going to create a new wf that you can use. He promised to look at this tomorrow.

tvami avatar Jun 16 '22 16:06 tvami

He promised to look at this tomorrow.

@tocheng do you have any updates?

tvami avatar Jun 21 '22 21:06 tvami

@tvami Please let me know if this is what is needed. https://github.com/cms-sw/cmssw/compare/master...tocheng:ALCA_PCL_Run3_CMSSW_12_5_X?expand=1

tocheng avatar Jun 23 '22 10:06 tocheng

Hi @tocheng these are the ALCARECOs in the alcareco matrix connected to the single muon:

SingleMuon TkAlMuonIsolated, HcalCalIterativePhiSym, MuAlCalIsolatedMu, HcalCalHO, HcalCalHBHEMuonProducerFilter, SiPixelCalSingleMuonLoose, SiPixelCalSingleMuonTight

I think you missed some of them, please add those! Thanks!

tvami avatar Jun 23 '22 10:06 tvami

And maybe we could add another one, which is purely a technical wf that adds all the ALCARECOs to the MinBias PD... this of course would physically be incorrect, but would test everything under one wf...

tvami avatar Jun 23 '22 10:06 tvami

@tocheng please submit the PR, at this point we have good Run-3, 13.6 TeV input data

tvami avatar Jul 08 '22 11:07 tvami

@tocheng ?

tvami avatar Jul 11 '22 08:07 tvami

Being addressed in https://github.com/cms-sw/cmssw/pull/38681

francescobrivio avatar Jul 11 '22 12:07 francescobrivio

+alca

  • new wf introduced in https://github.com/cms-sw/cmssw/pull/38681

tvami avatar Jul 21 '22 20:07 tvami

@jpata can you please take over from that? Thanks!

tvami avatar Jul 21 '22 20:07 tvami

Thanks! Which of the two new workflows should we use this instead of 136.889? From the reco point of view they are all equivalent, so the question is, which has the most representative ALCA configuration.

Note that on the reco side, we basically submit and analyze this 8-threaded profiling job "by hand" for each prerelease - so we don't have the personpower to study a large number of workflows per release at this time.

jpata avatar Aug 02 '22 13:08 jpata

I think you can go ahead with 1001.3, thanks!

tvami avatar Aug 02 '22 14:08 tvami

hi @jpata do you have any update on this? thanks!

tvami avatar Aug 09 '22 17:08 tvami

hi @cms-sw/reconstruction-l2 did this happen in the end?

tvami avatar Oct 03 '22 12:10 tvami

@clacaputo @mandrenguyen hi guys, do you you know if the 1001.3 is being profiled after all?

tvami avatar Dec 15 '22 02:12 tvami