cmssw icon indicating copy to clipboard operation
cmssw copied to clipboard

No execuation time comparison available for PRs

Open mandrenguyen opened this issue 2 years ago • 42 comments

Since a few months we are not able to see the CPU impact of a given pull request, which used to be possible with the enable profiling option in the Jenkins tests. This is a bit problematic for integrating new features, as we won't easily be able to keep track of changes in performance until a pre-release is built. The issue seems to come from igprof, which apparently can no longer really be supported. One suggestion from @gartung is to try to move to VTune.

mandrenguyen avatar Nov 02 '23 05:11 mandrenguyen

assign core, reconstruction

mandrenguyen avatar Nov 02 '23 05:11 mandrenguyen

New categories assigned: core,reconstruction

@Dr15Jones,@jfernan2,@makortel,@mandrenguyen,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild avatar Nov 02 '23 05:11 cmsbuild

A new Issue was created by @mandrenguyen Matthew Nguyen.

@rappoccio, @antoniovilela, @sextonkennedy, @makortel, @smuzaffar, @Dr15Jones can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

cmsbuild avatar Nov 02 '23 05:11 cmsbuild

The problem is that the cmsRun process itself gets a segfault while being profiled by Igprof. The same segfault might happen when being profiled with Vtune.

gartung avatar Nov 02 '23 13:11 gartung

In case IgProf+cmsRun combination crashes, is any information on the job timings saved that can be used for comparison?

makortel avatar Nov 02 '23 13:11 makortel

Usually the FastTimerService job completes and the average per module is contained in the raw json file if the resources piechart is not readable.

gartung avatar Nov 02 '23 14:11 gartung

The IgprofService dumps the profile after the first, middle and next to last event. The first one might not have enough data to be meaningful.

gartung avatar Nov 02 '23 14:11 gartung

@mandrenguyen Can you point me to a PR so I can look at the logs.

gartung avatar Nov 02 '23 14:11 gartung

@gartung The last one we tried with profiling enabled was: https://github.com/cms-sw/cmssw/pull/43107

mandrenguyen avatar Nov 02 '23 16:11 mandrenguyen

The crashes under profilers are quite likely caused by the memory corruption inside Tensorflow (when ran through IgProf or VTune) that has been investigated in https://github.com/cms-sw/cmssw/issues/42444.

makortel avatar Dec 13 '23 17:12 makortel

The FastTimer Service should suffice. Still It seems not active in RelVals

VinInn avatar Dec 15 '23 13:12 VinInn

The issue seems to come from igprof, which apparently can no longer really be supported. [...] is to try to move to VTune.

for my education is this replacement documented somewhere ? I still see igprof listed in the RecoIntegration CMS Twiki.

mmusich avatar Aug 26 '24 14:08 mmusich

@mmusich it is expected that VTune gives the same problem as igprof, so the replacement has not been done. Indeed this is a real showstopper for RECO developments since we cannot monitor the time profiling in PRs

jfernan2 avatar Sep 03 '24 10:09 jfernan2

so the replacement has not been done. Indeed this is a real showstopper for RECO developments since we cannot monitor the time profiling in PRs

I see, that's bad news. I gather the same holds true for user checks when developing (regardless of the time profiling in PRs)

mmusich avatar Sep 03 '24 10:09 mmusich

Is the most burning problem that there is no timing information (entire job, per module) or that the real IgProf/VTune profile (with function-level information) is missing (because of crash)?

makortel avatar Sep 05 '24 19:09 makortel

Is the most burning problem that there is no timing information (entire job, per module) or that the real IgProf/VTune profile (with function-level information) is missing (because of crash)?

for me (personally) at least, having the function level information would be really helpful.

mmusich avatar Sep 06 '24 17:09 mmusich

IMHO the crash of igprof/Vtune is a problem although there is timing info from FastTimer module, but the real issue is not having a comparison of baseline time performance vs baseline+PR, which force us to detect a posteriori total increases in the profiles when a prerelease is built, and then figure out which PR(s) were responsible....

Perhaps a comparison script based on FastTimer output could be useful even if not optimal, do you think this is possible @gartung ? Thanks

jfernan2 avatar Sep 06 '24 20:09 jfernan2

Yes it would be possible. In fact there is a script already that merges two FastTimer output files https://github.com/fwyzard/circles/blob/master/scripts/merge.py Changing that to diff two files should be possible.

gartung avatar Sep 10 '24 17:09 gartung

You can try this script

https://raw.githubusercontent.com/gartung/circles/master/scripts/diff.py

gartung avatar Sep 10 '24 18:09 gartung

If you add enable profiling as a comment on the pull request the FastTimer service is run as part of the profiling.

gartung avatar Sep 10 '24 18:09 gartung

Thanks, but the real need is to have the comparison in the PR, to see the changes introduced, the same way we had with igprof

enable profiling at this point only runs the FastTimer in the PR FW, but gives no comparison of time. which is what allows to decide

jfernan2 avatar Sep 19 '24 18:09 jfernan2

I am working on a pull request to cms-bot that will add the diff.py script and run it when enable profiling is commented. The PR timing and the diff of the PR timing vs the IB release will be linked in the summary. https://github.com/cms-sw/cms-bot/pull/2343

gartung avatar Sep 26 '24 16:09 gartung

This script has been added to pull request profiling and produces an html table of all of the modules in the resources file and their differences https://raw.githubusercontent.com/cms-sw/cms-bot/refs/heads/master/comparisons/resources-diff.py

gartung avatar Oct 01 '24 14:10 gartung

Thank you very much @gartung I will give it a try and give you feedback

jfernan2 avatar Oct 01 '24 18:10 jfernan2

AT present testing it on: https://github.com/cms-sw/cmssw/pull/46133

jfernan2 avatar Oct 01 '24 19:10 jfernan2

@gartung the new FastTimer diff script is working fine, see for example [1][2]. I have some feedback:

  • What about a different color legend for positive and negative values so that minus sign is not overlooked?
  • could we have some kind of summary or alarm which would spot salient changes (those in red for example) so that one does not have to scroll down all the table looking for differences?
  • I am afraid 10 events are too few for reliable tests, due to initialization times reading inputs we are having too many false positives with negative values (baseline always faster?) up to -80%, I'd suggest moving the profiling tests to 100 events at least
  • should we remove the links to igprof differences which are empty since a long time? See{3]

Thank you very much

[1] https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-232398/41851/profiling/29834.21/diff-step3_cpu.resources.json.html [2] https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d5463d/41850/profiling/29834.21/diff-step3_cpu.resources.json.html [3] https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d5463d/41850/profiling/index-29834.21.html

jfernan2 avatar Oct 02 '24 08:10 jfernan2

due to initialization times reading inputs we are having too many false positives

Do you mean something along the the baseline being run first, and that leading the input file being cached in memory?

If we are to go to that level of precision, I'd suggest to dd the input file to /dev/null first (to cache it in memory), and run one warmup job (could be the same number of events as the actual profiling job, or less) prior the actual measurement.

makortel avatar Oct 02 '24 15:10 makortel

In the table, I'd suggest to also add units to both time and memory, and in the cells present first the baseline and then the PR value (but keep the difference as "PR - baseline", as we do in the maxmemory table).

makortel avatar Oct 02 '24 15:10 makortel

due to initialization times reading inputs we are having too many false positives

Do you mean something along the the baseline being run first, and that leading the input file being cached in memory?

If we are to go to that level of precision, I'd suggest to dd the input file to /dev/null first (to cache it in memory), and run one warmup job (could be the same number of events as the actual profiling job, or less) prior the actual measurement.

I would not seek precisión, but something which allows to tell if there is a real change or not. It seems to me that with 10 events we are left to stats fluctuations which give more than 3-4% difference in about 90% of the modules being compared (orange everywhere).

jfernan2 avatar Oct 02 '24 17:10 jfernan2

Moreover, fluctuations seem to make some modules to increase and others to decrease, so perhaps a global total value of time per event as summary makes more sense if we cannot get rid of these ups and downs

jfernan2 avatar Oct 02 '24 18:10 jfernan2