devito
devito copied to clipboard
misc: Add MPI0 logging level
This logging mode helps to reduce the output of MPI runs to only emit performance numbers once per MPI rank. It is mostly aimed to reduce the redundant verbose performance output.
e.g. instead of:
Allocating host memory for u(2, 52, 72, 72) [2 MB]
Allocating host memory for u(2, 52, 72, 72) [2 MB]
Operator `Kernel` generated in 1.72 s
* lowering.IET: 0.71 s (41.5 %)
* specializing.IET: 0.66 s (38.6 %)
* make_mpi: 0.45 s (26.3 %)
* lowering.Expressions: 0.56 s (32.8 %)
* lowering.Clusters: 0.41 s (24.0 %)
Flops reduction after symbolic optimization: [209 --> 90]
Operator `Kernel` generated in 1.81 s
* lowering.IET: 0.72 s (40.0 %)
* specializing.IET: 0.66 s (36.6 %)
* make_mpi: 0.45 s (25.0 %)
* lowering.Expressions: 0.58 s (32.2 %)
* lowering.Clusters: 0.48 s (26.7 %)
Flops reduction after symbolic optimization: [209 --> 90]
Operator `Kernel` fetched `/tmp/devito-jitcache-uid1000/562b2d8ee0a6d254108e1dfb5522a645f2d8d8e2.c` in 0.06 s from jit-cache
Operator `Kernel` fetched `/tmp/devito-jitcache-uid1000/562b2d8ee0a6d254108e1dfb5522a645f2d8d8e2.c` in 0.06 s from jit-cache
Operator `Kernel` ran in 0.01 s
Operator `Kernel` ran in 0.01 s
Global performance: [OI=2.81, 11.16 GFlops/s, 0.14 GPts/s]
Global performance: [OI=2.81, 11.16 GFlops/s, 0.14 GPts/s]
Local performance:
Local performance:
* section0[rank0]<21,2,5,8,8,40> ran in 0.01 s [OI=2.81, 10.95 GFlops/s, 0.14 GPts/s]
* section0[rank0]<21,2,5,8,8,40> ran in 0.01 s [OI=2.81, 10.95 GFlops/s, 0.14 GPts/s]
* section0[rank1]<21,2,5,8,8,40> ran in 0.01 s [OI=2.81, 10.95 GFlops/s, 0.14 GPts/s]
Performance[mode=advanced] arguments: {'x0_blk0_size': 8, 'y0_blk0_size': 8}
* section0[rank1]<21,2,5,8,8,40> ran in 0.01 s [OI=2.81, 10.95 GFlops/s, 0.14 GPts/s]
Performance[mode=advanced] arguments: {'x0_blk0_size': 8, 'y0_blk0_size': 8}
Norm u is 2325.5583
Norm u is 2325.5583
we get:
Allocating host memory for u(2, 52, 72, 72) [2 MB]
Allocating host memory for u(2, 52, 72, 72) [2 MB]
Operator `Kernel` generated in 1.81 s
* lowering.IET: 0.73 s (40.6 %)
* specializing.IET: 0.67 s (37.3 %)
* make_mpi: 0.46 s (25.6 %)
* lowering.Expressions: 0.54 s (30.0 %)
* lowering.Clusters: 0.51 s (28.4 %)
* specializing.Clusters: 0.39 s (21.7 %)
Flops reduction after symbolic optimization: [209 --> 90]
Operator `Kernel` generated in 1.80 s
* lowering.IET: 0.75 s (41.8 %)
* specializing.IET: 0.69 s (38.4 %)
* make_mpi: 0.48 s (26.8 %)
* lowering.Expressions: 0.56 s (31.2 %)
* lowering.Clusters: 0.46 s (25.6 %)
Flops reduction after symbolic optimization: [209 --> 90]
Operator `Kernel` fetched `/tmp/devito-jitcache-uid1000/562b2d8ee0a6d254108e1dfb5522a645f2d8d8e2.c` in 0.06 s from jit-cache
Operator `Kernel` fetched `/tmp/devito-jitcache-uid1000/562b2d8ee0a6d254108e1dfb5522a645f2d8d8e2.c` in 0.06 s from jit-cache
Operator `Kernel` ran in 0.01 s
Global performance: [OI=2.81, 11.16 GFlops/s, 0.14 GPts/s]
Local performance:
* section0[rank0]<21,2,5,8,8,40> ran in 0.01 s [OI=2.81, 11.65 GFlops/s, 0.15 GPts/s]
* section0[rank1]<21,2,5,8,8,40> ran in 0.01 s [OI=2.81, 11.67 GFlops/s, 0.15 GPts/s]
Performance[mode=advanced] arguments: {'x0_blk0_size': 8, 'y0_blk0_size': 8}
Norm u is 2325.5583
Norm u is 2325.558