amr-wind icon indicating copy to clipboard operation
amr-wind copied to clipboard

Gradual slowdown of Amr-wind solver performance

Open lawrenceccheung opened this issue 1 month ago • 10 comments

Bug description

After running the ExaWind driver or AMR-Wind solver for 10,000's or 100,000's of iterations, there is sometimes a noticeable slowdown in the solver performance. Solve times which were initially on the order of the 3-4 secs/iter can grow to 8-9 secs/iter.

This example is a case is from @ndevelder using the exawind hybrid solver, and showing that the slowdown is coming from the AMR-Wind solver alone: image

It also appears in AMR-Wind only solutions, in this this case a 9 turbine wind farm case run with OpenFAST coupling image Here the typical solve time per iterations starts out around ~0.5 sec/iter and then grows to ~1 sec/iter about 40,000 iterations later. What's interesting is that if you restart the case, the solve time go back to ~0.5 sec/iter before slowly growing again.

Timing data from the log files can be extracted and plotted using

grep WallClockTime log1.txt |gnuplot -p -e "set yr [0:10]; plot '<cat' using 2:6;"

for AMR-Wind log files and

grep "AMR-Wind::Total" log | gnuplot -p -e "set yr [0:10]; plot '<cat' using 2:5;"

for ExaWind log files.

Note the number of solver iterations remains constant in AMR-Wind, here is a plot of the MAC and Nodal projection iterations required over the length the run: image

Note also that the solve process also seems relatively unaffected, see the before restart/after restarts snippet from the log file below.

Steps to reproduce

Steps to reproduce the behavior:

  1. Compiler used
    • [ ] GCC
    • [ ] LLVM
    • [ ] oneapi (Intel)
    • [ ] nvcc (NVIDIA)
    • [x] rocm (AMD)
    • [ ] with MPI
    • [x] Clang
  2. Operating system
    • [x] Linux
    • [ ] OSX
    • [ ] Windows
    • [ ] other (do tell ;)):
  3. Hardware:
    • [x] CPU
    • [x] GPU
  4. Machine details (): Observed this on runs with:
  • Frontier (GPU)
  • Sandia HPC (CPU)
  1. Input file attachments
  2. Error (paste or attach):
Step: 106124 dt: 0.02 Time: 28009.96 to 28009.98
CFL: 0.292407 (conv: 0.101057 diff: 0 src: 0.236542 )

Godunov:
  System                     Iters      Initial residual        Final residual
  ----------------------------------------------------------------------------
  MAC_projection                 4           1.706609085       1.653896045e-06
  temperature_solve              2       0.0001600051844       1.516013981e-10
  tke_solve                      1        0.001830700216       1.433928062e-06
  velocity_solve                 1        0.002116535213       2.247267522e-06
  Nodal_projection               4           2.918882696       5.527631235e-07

WallClockTime: 106124 Pre: 0.0393 Solve: 1.014 Post: 0.0176 Total: 1.071
Solve time per cell: 9.08e-06
Step: 106124 dt: 0.02 Time: 28009.96 to 28009.98
CFL: 0.292413 (conv: 0.101057 diff: 0 src: 0.236548 )

Godunov:
  System                     Iters      Initial residual        Final residual
  ----------------------------------------------------------------------------
  MAC_projection                 4           1.706301224       1.476950891e-06
  temperature_solve              2       0.0001600004604       1.516582415e-10
  tke_solve                      1        0.001830697993       1.433928058e-06
  velocity_solve                 1        0.002116534853       2.247266377e-06
  Nodal_projection               4           2.918889138       5.520496715e-07

WallClockTime: 106124 Pre: 0.0338 Solve: 0.5783 Post: 0.00379 Total: 0.616
Solve time per cell: 7.768e-06
  1. If this is a segfault, a stack trace from a debug build (paste or attach):
<!-- stack trace -->

AMR-Wind information

Problem has existed since at least

==============================================================================
                AMR-Wind (https://github.com/exawind/amr-wind)

  AMR-Wind version :: v2.0.0-4-gc70c279e
  AMR-Wind Git SHA :: c70c279eb6901edc4466d6f96f10e522ca6b62f9
  AMReX version    :: 24.03-36-g748f8dfea597

  Exec. time       :: Mon May 27 03:00:45 2024
  Build time       :: May 20 2024 00:00:24
  C++ compiler     :: Clang 15.0.0

  MPI              :: ON    (Num. ranks = 2400)
  GPU              :: ON    (Backend: HIP)
  OpenMP           :: OFF

  Enabled third-party libraries: 
    NetCDF    4.7.4
    HYPRE     2.31.0
    OpenFAST  

lawrenceccheung avatar Jan 14 '25 05:01 lawrenceccheung