devito icon indicating copy to clipboard operation
devito copied to clipboard

misc: Add MPI0 logging level

Open georgebisbas opened this issue 1 year ago • 5 comments

This logging mode helps to reduce the output of MPI runs to only emit performance numbers once per MPI rank. It is mostly aimed to reduce the redundant verbose performance output.

e.g. instead of:

Allocating host memory for u(2, 52, 72, 72) [2 MB]
Allocating host memory for u(2, 52, 72, 72) [2 MB]
Operator `Kernel` generated in 1.72 s
  * lowering.IET: 0.71 s (41.5 %)
     * specializing.IET: 0.66 s (38.6 %)
        * make_mpi: 0.45 s (26.3 %)
  * lowering.Expressions: 0.56 s (32.8 %)
  * lowering.Clusters: 0.41 s (24.0 %)
Flops reduction after symbolic optimization: [209 --> 90]
Operator `Kernel` generated in 1.81 s
  * lowering.IET: 0.72 s (40.0 %)
     * specializing.IET: 0.66 s (36.6 %)
        * make_mpi: 0.45 s (25.0 %)
  * lowering.Expressions: 0.58 s (32.2 %)
  * lowering.Clusters: 0.48 s (26.7 %)
Flops reduction after symbolic optimization: [209 --> 90]
Operator `Kernel` fetched `/tmp/devito-jitcache-uid1000/562b2d8ee0a6d254108e1dfb5522a645f2d8d8e2.c` in 0.06 s from jit-cache
Operator `Kernel` fetched `/tmp/devito-jitcache-uid1000/562b2d8ee0a6d254108e1dfb5522a645f2d8d8e2.c` in 0.06 s from jit-cache
Operator `Kernel` ran in 0.01 s
Operator `Kernel` ran in 0.01 s
Global performance: [OI=2.81, 11.16 GFlops/s, 0.14 GPts/s]
Global performance: [OI=2.81, 11.16 GFlops/s, 0.14 GPts/s]
Local performance:
Local performance:
  * section0[rank0]<21,2,5,8,8,40> ran in 0.01 s [OI=2.81, 10.95 GFlops/s, 0.14 GPts/s]
  * section0[rank0]<21,2,5,8,8,40> ran in 0.01 s [OI=2.81, 10.95 GFlops/s, 0.14 GPts/s]
  * section0[rank1]<21,2,5,8,8,40> ran in 0.01 s [OI=2.81, 10.95 GFlops/s, 0.14 GPts/s]
Performance[mode=advanced] arguments: {'x0_blk0_size': 8, 'y0_blk0_size': 8}
  * section0[rank1]<21,2,5,8,8,40> ran in 0.01 s [OI=2.81, 10.95 GFlops/s, 0.14 GPts/s]
Performance[mode=advanced] arguments: {'x0_blk0_size': 8, 'y0_blk0_size': 8}
Norm u is 2325.5583
Norm u is 2325.5583

we get:

Allocating host memory for u(2, 52, 72, 72) [2 MB]
Allocating host memory for u(2, 52, 72, 72) [2 MB]
Operator `Kernel` generated in 1.81 s
  * lowering.IET: 0.73 s (40.6 %)
     * specializing.IET: 0.67 s (37.3 %)
        * make_mpi: 0.46 s (25.6 %)
  * lowering.Expressions: 0.54 s (30.0 %)
  * lowering.Clusters: 0.51 s (28.4 %)
     * specializing.Clusters: 0.39 s (21.7 %)
Flops reduction after symbolic optimization: [209 --> 90]
Operator `Kernel` generated in 1.80 s
  * lowering.IET: 0.75 s (41.8 %)
     * specializing.IET: 0.69 s (38.4 %)
        * make_mpi: 0.48 s (26.8 %)
  * lowering.Expressions: 0.56 s (31.2 %)
  * lowering.Clusters: 0.46 s (25.6 %)
Flops reduction after symbolic optimization: [209 --> 90]
Operator `Kernel` fetched `/tmp/devito-jitcache-uid1000/562b2d8ee0a6d254108e1dfb5522a645f2d8d8e2.c` in 0.06 s from jit-cache
Operator `Kernel` fetched `/tmp/devito-jitcache-uid1000/562b2d8ee0a6d254108e1dfb5522a645f2d8d8e2.c` in 0.06 s from jit-cache
Operator `Kernel` ran in 0.01 s
Global performance: [OI=2.81, 11.16 GFlops/s, 0.14 GPts/s]
Local performance:
  * section0[rank0]<21,2,5,8,8,40> ran in 0.01 s [OI=2.81, 11.65 GFlops/s, 0.15 GPts/s]
  * section0[rank1]<21,2,5,8,8,40> ran in 0.01 s [OI=2.81, 11.67 GFlops/s, 0.15 GPts/s]
Performance[mode=advanced] arguments: {'x0_blk0_size': 8, 'y0_blk0_size': 8}
Norm u is 2325.5583
Norm u is 2325.558

georgebisbas avatar May 17 '23 17:05 georgebisbas