ginkgo icon indicating copy to clipboard operation
ginkgo copied to clipboard

Add TAU logger

Open upsj opened this issue 2 years ago • 1 comments

The logger adds timers for all important Ginkgo events based on the perfstubs library. To make it work, we also need to forward all logger events to the executor. This is meant to show how we can annotate a profile with Ginkgo-specific information.

An example profile from 10000 iterations in preconditioned-solver.cpp. Not sure what's going on with spmv#3, but this may be related to inclusive vs. exclusive timing. Profile

upsj avatar Jun 17 '22 17:06 upsj

Note: This PR changes the Ginkgo ABI:

Functions changes summary: 0 Removed, 594 Changed (42529 filtered out), 182 Added functions
Variables changes summary: 0 Removed, 0 Changed, 0 Added variable

For details check the full ABI diff under Artifacts here

ginkgo-bot avatar Jun 17 '22 18:06 ginkgo-bot

format-rebase!

upsj avatar Dec 08 '22 21:12 upsj

Formatting rebase introduced changes, see Artifacts here to review them

ginkgo-bot avatar Dec 08 '22 21:12 ginkgo-bot

I'm very open to other solutions to the global initialization problem - we have a similar issue with things like MPI_Init/Finalize, where users may or may not want us to take care of initialization. I do like your suggestion a lot, I didn't consider letting the Executor decide whether to receive propagated events. We could potentially also extend that to handle the split between host and device executor, making the workaround for the benchmarks unnecessary? But that would make the host executor the destination for all events, which is usually not that easy to access compared to the device executor, which is used everywhere.

upsj avatar Dec 09 '22 10:12 upsj

On the global object question: I didn't realize until now, in case somebody were to run some kind of weird hybrid solve, it might actually make sense to have multiple different loggers with different properties all providing NVTX ranges. I'll add a bit of color to the nvtx example :)

upsj avatar Dec 09 '22 10:12 upsj

Not using singletons for nvtx and roctx definitely makes sense: We can assign different colors to different executors that way (with a bit of corporate branding 😄 ): Yellow is GPU, gray is CPU

Screenshot 2022-12-09 120621

upsj avatar Dec 09 '22 11:12 upsj

Random thought: A future feature that would work especially well for NVTX would be allowing users to give names to objects: profiler_set_name(linop.get(), "mass_matrix") would probably make the profile much more readable, and be very cheap to implement

upsj avatar Dec 09 '22 22:12 upsj

ROCm profiling also seems to work :) Screenshot from 2022-12-12 12-13-59

upsj avatar Dec 12 '22 11:12 upsj

For future reference, here's how to do the same thing with Intel vTune

https://www.intel.com/content/www/us/en/develop/documentation/vtune-help/top/api-support/instrumentation-and-tracing-technology-apis/basic-usage-and-configuration.html#basic-usage-and-configuration https://www.intel.com/content/www/us/en/develop/documentation/vtune-help/top/api-support/instrumentation-and-tracing-technology-apis/basic-usage-and-configuration/configuring-your-build-system.html https://www.intel.com/content/www/us/en/develop/documentation/vtune-help/top/api-support/instrumentation-and-tracing-technology-apis/instrumentation-tracing-technology-api-reference/task-api.html

upsj avatar Jan 26 '23 14:01 upsj

Codecov Report

Base: 91.45% // Head: 91.23% // Decreases project coverage by -0.22% :warning:

Coverage data is based on head (f3bed43) compared to base (528efb6). Patch coverage: 27.69% of modified lines in pull request are covered.

:exclamation: Current head f3bed43 differs from pull request most recent head 159bd83. Consider uploading reports for the commit 159bd83 to get more accurate results

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1055      +/-   ##
===========================================
- Coverage    91.45%   91.23%   -0.22%     
===========================================
  Files          556      559       +3     
  Lines        47498    47644     +146     
===========================================
+ Hits         43437    43467      +30     
- Misses        4061     4177     +116     
Impacted Files Coverage Δ
core/device_hooks/cuda_hooks.cpp 42.50% <0.00%> (-4.73%) :arrow_down:
core/device_hooks/hip_hooks.cpp 44.73% <0.00%> (-2.49%) :arrow_down:
core/log/profiler_hook.cpp 0.00% <0.00%> (ø)
core/log/tau.cpp 0.00% <0.00%> (ø)
core/log/vtune.cpp 0.00% <0.00%> (ø)
core/test/stop/criterion.cpp 100.00% <ø> (ø)
include/ginkgo/core/log/stream.hpp 50.00% <ø> (ø)
include/ginkgo/core/base/executor.hpp 72.55% <92.30%> (+1.27%) :arrow_up:
core/test/log/logger.cpp 100.00% <100.00%> (ø)
include/ginkgo/core/log/logger.hpp 96.55% <100.00%> (+0.71%) :arrow_up:
... and 10 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov[bot] avatar Jan 29 '23 00:01 codecov[bot]

VTune now also works Screenshot from 2023-01-30 12-01-22 I set it as the default profiler for DPC++, but not for other backends. Any opinions on that?

upsj avatar Jan 30 '23 11:01 upsj

Note: This PR changes the Ginkgo ABI:

Functions changes summary: 0 Removed, 291 Changed (57151 filtered out), 306 Added functions
Variables changes summary: 0 Removed, 0 Changed, 0 Added variable

For details check the full ABI diff under Artifacts here

ginkgo-bot avatar Feb 02 '23 11:02 ginkgo-bot