warpx icon indicating copy to clipboard operation
warpx copied to clipboard

[WIP] Profiler tags for visualizing BTD in NsightSystem

Open RevathiJambunathan opened this issue 3 years ago • 1 comments

RevathiJambunathan avatar Sep 23 '22 20:09 RevathiJambunathan

Looking at a Nsight profile I made a while ago for HiPACE++ I noticed two things:

  • Properly synchronized function/profiler names are available under Processes/[…] hipace/CUDA HW …/99.7% Context 1/82.4% Stream 17/NVTX So basically, not in Threads and not in [All Streams] but in the category of the most used Stream(s).

  • The automatic synchronizing after a MFIter loop (and likely also ParIter) can take 20-30µs with one Rank, one GPU and one MF Box. If BTD uses a lot of these with little compute load, this might be the reason for the slow performance.

AlexanderSinn avatar Sep 27 '22 18:09 AlexanderSinn