devito icon indicating copy to clipboard operation
devito copied to clipboard

misc: Add MPI0 logging level

Open georgebisbas opened this issue 2 years ago • 5 comments

This logging mode helps to reduce the output of MPI runs to only emit performance numbers once per MPI rank. It is mostly aimed to reduce the redundant verbose performance output.

e.g. instead of:

Allocating host memory for u(2, 52, 72, 72) [2 MB]
Allocating host memory for u(2, 52, 72, 72) [2 MB]
Operator `Kernel` generated in 1.72 s
  * lowering.IET: 0.71 s (41.5 %)
     * specializing.IET: 0.66 s (38.6 %)
        * make_mpi: 0.45 s (26.3 %)
  * lowering.Expressions: 0.56 s (32.8 %)
  * lowering.Clusters: 0.41 s (24.0 %)
Flops reduction after symbolic optimization: [209 --> 90]
Operator `Kernel` generated in 1.81 s
  * lowering.IET: 0.72 s (40.0 %)
     * specializing.IET: 0.66 s (36.6 %)
        * make_mpi: 0.45 s (25.0 %)
  * lowering.Expressions: 0.58 s (32.2 %)
  * lowering.Clusters: 0.48 s (26.7 %)
Flops reduction after symbolic optimization: [209 --> 90]
Operator `Kernel` fetched `/tmp/devito-jitcache-uid1000/562b2d8ee0a6d254108e1dfb5522a645f2d8d8e2.c` in 0.06 s from jit-cache
Operator `Kernel` fetched `/tmp/devito-jitcache-uid1000/562b2d8ee0a6d254108e1dfb5522a645f2d8d8e2.c` in 0.06 s from jit-cache
Operator `Kernel` ran in 0.01 s
Operator `Kernel` ran in 0.01 s
Global performance: [OI=2.81, 11.16 GFlops/s, 0.14 GPts/s]
Global performance: [OI=2.81, 11.16 GFlops/s, 0.14 GPts/s]
Local performance:
Local performance:
  * section0[rank0]<21,2,5,8,8,40> ran in 0.01 s [OI=2.81, 10.95 GFlops/s, 0.14 GPts/s]
  * section0[rank0]<21,2,5,8,8,40> ran in 0.01 s [OI=2.81, 10.95 GFlops/s, 0.14 GPts/s]
  * section0[rank1]<21,2,5,8,8,40> ran in 0.01 s [OI=2.81, 10.95 GFlops/s, 0.14 GPts/s]
Performance[mode=advanced] arguments: {'x0_blk0_size': 8, 'y0_blk0_size': 8}
  * section0[rank1]<21,2,5,8,8,40> ran in 0.01 s [OI=2.81, 10.95 GFlops/s, 0.14 GPts/s]
Performance[mode=advanced] arguments: {'x0_blk0_size': 8, 'y0_blk0_size': 8}
Norm u is 2325.5583
Norm u is 2325.5583

we get:

Allocating host memory for u(2, 52, 72, 72) [2 MB]
Allocating host memory for u(2, 52, 72, 72) [2 MB]
Operator `Kernel` generated in 1.81 s
  * lowering.IET: 0.73 s (40.6 %)
     * specializing.IET: 0.67 s (37.3 %)
        * make_mpi: 0.46 s (25.6 %)
  * lowering.Expressions: 0.54 s (30.0 %)
  * lowering.Clusters: 0.51 s (28.4 %)
     * specializing.Clusters: 0.39 s (21.7 %)
Flops reduction after symbolic optimization: [209 --> 90]
Operator `Kernel` generated in 1.80 s
  * lowering.IET: 0.75 s (41.8 %)
     * specializing.IET: 0.69 s (38.4 %)
        * make_mpi: 0.48 s (26.8 %)
  * lowering.Expressions: 0.56 s (31.2 %)
  * lowering.Clusters: 0.46 s (25.6 %)
Flops reduction after symbolic optimization: [209 --> 90]
Operator `Kernel` fetched `/tmp/devito-jitcache-uid1000/562b2d8ee0a6d254108e1dfb5522a645f2d8d8e2.c` in 0.06 s from jit-cache
Operator `Kernel` fetched `/tmp/devito-jitcache-uid1000/562b2d8ee0a6d254108e1dfb5522a645f2d8d8e2.c` in 0.06 s from jit-cache
Operator `Kernel` ran in 0.01 s
Global performance: [OI=2.81, 11.16 GFlops/s, 0.14 GPts/s]
Local performance:
  * section0[rank0]<21,2,5,8,8,40> ran in 0.01 s [OI=2.81, 11.65 GFlops/s, 0.15 GPts/s]
  * section0[rank1]<21,2,5,8,8,40> ran in 0.01 s [OI=2.81, 11.67 GFlops/s, 0.15 GPts/s]
Performance[mode=advanced] arguments: {'x0_blk0_size': 8, 'y0_blk0_size': 8}
Norm u is 2325.5583
Norm u is 2325.558

georgebisbas avatar May 17 '23 17:05 georgebisbas

Codecov Report

Attention: Patch coverage is 93.75000% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 79.46%. Comparing base (8729cd2) to head (e459d52).

Files Patch % Lines
devito/operator/profiling.py 0.00% 0 Missing and 1 partial :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2130      +/-   ##
==========================================
+ Coverage   79.44%   79.46%   +0.01%     
==========================================
  Files         232      232              
  Lines       43607    43618      +11     
  Branches     8072     8073       +1     
==========================================
+ Hits        34645    34660      +15     
+ Misses       8207     8203       -4     
  Partials      755      755              

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar May 17 '23 18:05 codecov[bot]

To add: -Until Devito v3.5 included, domain decomposition occurs along the fastest axis. As of later versions, domain decomposition occurs along the slowest axis, for performance reasons. And yes, it is possible to control the domain decomposition in user code, but this is not neatly documented. Take a look at test_custom_topology in this file. In essence, Grid accepts the optional argument topology, which allows the user to pass a custom topology as an n-tuple, where n is the number of distributed dimensions. For example, for a two-dimensional grid, the topology (4, 1) will decompose the slowest axis into four partitions, one partition per MPI rank, while the fastest axis will be replicated over all MPI ranks. +Until Devito v3.5 included, domain decomposition occurs along the fastest axis. As of later versions, domain decomposition occurs along the slowest axis, for performance reasons. And yes, it is possible to control the domain decomposition in user code, but this is not neatly documented. Take a look at +class CustomTopology in distributed.py +and test_custom_topology in this file. In essence, Grid accepts the optional argument topology, which allows the user to pass a custom topology as an n-tuple, where n is the number of distributed dimensions. For example, for a two-dimensional grid, the topology (4, 1) will decompose the slowest axis into four partitions, one partition per MPI rank, while the fastest axis will be replicated over all MPI ranks.

georgebisbas avatar Jun 29 '23 13:06 georgebisbas

Would this PR help: https://github.com/devitocodes/devito/pull/2175 to avoid changing the user-level setup and only make a temp switchconfig as per the original plan?

georgebisbas avatar Aug 02 '23 06:08 georgebisbas

Is this PR fine?

georgebisbas avatar Nov 03 '23 17:11 georgebisbas

Converting to draft as apprantly switchcongfig does not work

georgebisbas avatar Dec 20 '23 09:12 georgebisbas