Trixi.jl icon indicating copy to clipboard operation
Trixi.jl copied to clipboard

Proof of concept: TrixiMPIArray

Open ranocha opened this issue 2 years ago • 14 comments

This is a rough draft of a possible MPI array type. A lot of TODO notes are left in the draft at the moment.

Partially implemented in a reduced version (only ode_norm and ode_unstable_check) in #1113. We will use this reduced version for now and see how it works in the wild.

TODO:

  • [ ] Local reductions (sum) - to docstring or test whether we could also just use local mapreduce and parallel ode_norm?
  • [ ] Check step rejections
  • [ ] Check some complex setups (MPI shock capturing does not use alpha smoothing! but everything else should work, incl. AMR)
  • [ ] Maybe performance of serial vs. one MPI rank (needs some hacks, mpi_parallel and mpi_isparallel)

Closes #329; closes #339

ranocha avatar Mar 30 '22 15:03 ranocha

Codecov Report

Merging #1104 (cdcf828) into main (1b604a6) will increase coverage by 0.00%. The diff coverage is 98.81%.

@@           Coverage Diff           @@
##             main    #1104   +/-   ##
=======================================
  Coverage   96.75%   96.75%           
=======================================
  Files         303      305    +2     
  Lines       23876    23931   +55     
=======================================
+ Hits        23099    23153   +54     
- Misses        777      778    +1     
Flag Coverage Δ
unittests 96.75% <98.81%> (+<0.01%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/Trixi.jl 66.67% <ø> (ø)
src/callbacks_step/save_restart_dg.jl 89.36% <ø> (ø)
src/callbacks_step/save_solution_dg.jl 95.89% <ø> (ø)
src/auxiliary/mpi_arrays.jl 97.92% <97.92%> (ø)
src/callbacks_step/amr.jl 97.07% <100.00%> (ø)
src/callbacks_step/analysis_dg2d_parallel.jl 100.00% <100.00%> (ø)
src/callbacks_step/stepsize_dg2d.jl 100.00% <100.00%> (ø)
src/callbacks_step/stepsize_dg3d.jl 100.00% <100.00%> (ø)
src/callbacks_step/time_series_dg2d.jl 100.00% <100.00%> (ø)
src/meshes/meshes.jl 100.00% <100.00%> (ø)
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 1b604a6...cdcf828. Read the comment docs.

codecov[bot] avatar Mar 30 '22 18:03 codecov[bot]

Some results from 987407e8

julia --check-bounds=no --threads=2

julia> trixi_include("examples/tree_2d_dgsem/elixir_euler_ec.jl", tspan=(0.0, 10.0))

julia> sol = solve(ode, CarpenterKennedy2N54(williamson_condition=false), dt=1.0, save_everystep=false, callback=callbacks); summary_callback()
 ────────────────────────────────────────────────────────────────────────────────────
              Trixi.jl                      Time                    Allocations      
                                   ───────────────────────   ────────────────────────
         Tot / % measured:              2.73s /  90.4%           23.5MiB /  97.3%    

 Section                   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────
 rhs!                       4.24k    2.36s   95.9%   558μs   7.57MiB   33.1%  1.83KiB
   volume integral          4.24k    1.94s   78.6%   458μs   1.16MiB    5.1%     288B
   interface flux           4.24k    251ms   10.2%  59.2μs   1.62MiB    7.1%     400B
   prolong2interfaces       4.24k   58.2ms    2.4%  13.7μs   0.97MiB    4.2%     240B
   surface integral         4.24k   56.3ms    2.3%  13.3μs   1.23MiB    5.4%     304B
   reset ∂u/∂t              4.24k   28.3ms    1.1%  6.68μs     0.00B    0.0%    0.00B
   Jacobian                 4.24k   22.3ms    0.9%  5.27μs   1.10MiB    4.8%     272B
   ~rhs!~                   4.24k   8.06ms    0.3%  1.90μs   1.50MiB    6.5%     370B
   prolong2boundaries       4.24k    251μs    0.0%  59.2ns     0.00B    0.0%    0.00B
   prolong2mortars          4.24k    177μs    0.0%  41.7ns     0.00B    0.0%    0.00B
   mortar flux              4.24k    145μs    0.0%  34.3ns     0.00B    0.0%    0.00B
   source terms             4.24k   91.7μs    0.0%  21.6ns     0.00B    0.0%    0.00B
   boundary flux            4.24k   87.0μs    0.0%  20.5ns     0.00B    0.0%    0.00B
 calculate dt                 848   50.1ms    2.0%  59.0μs     0.00B    0.0%    0.00B
 analyze solution              10   30.6ms    1.2%  3.06ms    174KiB    0.7%  17.4KiB
 I/O                           11   20.9ms    0.8%  1.90ms   15.1MiB   66.1%  1.38MiB
   save solution               10   20.7ms    0.8%  2.07ms   15.1MiB   66.0%  1.51MiB
   get element variables       10   97.2μs    0.0%  9.72μs   20.6KiB    0.1%  2.06KiB
   ~I/O~                       11   26.0μs    0.0%  2.37μs   7.20KiB    0.0%     671B
   save mesh                   10    785ns    0.0%  78.5ns     0.00B    0.0%    0.00B
 ────────────────────────────────────────────────────────────────────────────────────

julia> sol = solve(ode, RDPK3SpFSAL35(), abstol=1.0e-4, reltol=1.0e-4, save_everystep=false, callback=callbacks); summary_callback()
 ────────────────────────────────────────────────────────────────────────────────────
              Trixi.jl                      Time                    Allocations      
                                   ───────────────────────   ────────────────────────
         Tot / % measured:              1.43s /  81.7%           15.5MiB /  86.5%    

 Section                   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────
 rhs!                       2.35k    1.14s   97.7%   487μs   4.20MiB   31.4%  1.83KiB
   volume integral          2.35k    924ms   79.0%   394μs    660KiB    4.8%     288B
   interface flux           2.35k    121ms   10.3%  51.3μs    917KiB    6.7%     400B
   prolong2interfaces       2.35k   32.3ms    2.8%  13.7μs    550KiB    4.0%     240B
   surface integral         2.35k   30.9ms    2.6%  13.1μs    697KiB    5.1%     304B
   reset ∂u/∂t              2.35k   17.4ms    1.5%  7.42μs     0.00B    0.0%    0.00B
   Jacobian                 2.35k   12.9ms    1.1%  5.51μs    624KiB    4.6%     272B
   ~rhs!~                   2.35k   4.41ms    0.4%  1.88μs    853KiB    6.2%     372B
   prolong2boundaries       2.35k    158μs    0.0%  67.3ns     0.00B    0.0%    0.00B
   prolong2mortars          2.35k    104μs    0.0%  44.2ns     0.00B    0.0%    0.00B
   mortar flux              2.35k   79.6μs    0.0%  33.9ns     0.00B    0.0%    0.00B
   source terms             2.35k   54.2μs    0.0%  23.1ns     0.00B    0.0%    0.00B
   boundary flux            2.35k   50.1μs    0.0%  21.3ns     0.00B    0.0%    0.00B
 analyze solution               6   18.3ms    1.6%  3.05ms    105KiB    0.8%  17.5KiB
 I/O                            7   9.09ms    0.8%  1.30ms   9.08MiB   67.8%  1.30MiB
   save solution                6   9.00ms    0.8%  1.50ms   9.06MiB   67.7%  1.51MiB
   get element variables        6   73.3μs    0.0%  12.2μs   12.4KiB    0.1%  2.06KiB
   ~I/O~                        7   16.2μs    0.0%  2.31μs   5.20KiB    0.0%     761B
   save mesh                    6    448ns    0.0%  74.7ns     0.00B    0.0%    0.00B
 ────────────────────────────────────────────────────────────────────────────────────

tmpi 2 julia --check-bounds=no --threads=1

julia> trixi_include("examples/tree_2d_dgsem/elixir_euler_ec.jl", tspan=(0.0, 10.0))

julia> sol = solve(ode, CarpenterKennedy2N54(williamson_condition=false), dt=1.0, save_everystep=false, callback=callbacks); summary_callback()
 ────────────────────────────────────────────────────────────────────────────────────
              Trixi.jl                      Time                    Allocations      
                                   ───────────────────────   ────────────────────────
         Tot / % measured:              2.72s /  95.4%           19.2MiB /  98.0%    

 Section                   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────
 rhs!                       4.24k    2.49s   95.9%   588μs   3.44MiB   18.3%     852B
   volume integral          4.24k    2.01s   77.4%   475μs     0.00B    0.0%    0.00B
   interface flux           4.24k    277ms   10.6%  65.3μs     0.00B    0.0%    0.00B
   surface integral         4.24k   55.9ms    2.1%  13.2μs     0.00B    0.0%    0.00B
   prolong2interfaces       4.24k   52.2ms    2.0%  12.3μs     0.00B    0.0%    0.00B
   reset ∂u/∂t              4.24k   23.0ms    0.9%  5.44μs     0.00B    0.0%    0.00B
   Jacobian                 4.24k   19.6ms    0.8%  4.64μs     0.00B    0.0%    0.00B
   MPI interface flux       4.24k   13.6ms    0.5%  3.22μs     0.00B    0.0%    0.00B
   ~rhs!~                   4.24k   11.8ms    0.5%  2.79μs   1.70MiB    9.0%     420B
   finish MPI receive       4.24k   11.4ms    0.4%  2.68μs    530KiB    2.8%     128B
   start MPI send           4.24k   9.67ms    0.4%  2.28μs    397KiB    2.1%    96.0B
   prolong2mpiinterfaces    4.24k   3.17ms    0.1%   749ns     0.00B    0.0%    0.00B
   finish MPI send          4.24k   1.03ms    0.0%   243ns    596KiB    3.1%     144B
   start MPI receive        4.24k    912μs    0.0%   215ns    265KiB    1.4%    64.0B
   prolong2mortars          4.24k    286μs    0.0%  67.5ns     0.00B    0.0%    0.00B
   prolong2boundaries       4.24k    256μs    0.0%  60.3ns     0.00B    0.0%    0.00B
   MPI mortar flux          4.24k    224μs    0.0%  52.8ns     0.00B    0.0%    0.00B
   prolong2mpimortars       4.24k    210μs    0.0%  49.6ns     0.00B    0.0%    0.00B
   mortar flux              4.24k    148μs    0.0%  35.0ns     0.00B    0.0%    0.00B
   boundary flux            4.24k   91.0μs    0.0%  21.5ns     0.00B    0.0%    0.00B
   source terms             4.24k   75.2μs    0.0%  17.8ns     0.00B    0.0%    0.00B
 calculate dt                 848   70.5ms    2.7%  83.2μs   79.5KiB    0.4%    96.0B
 analyze solution              10   22.1ms    0.9%  2.21ms   2.61MiB   13.9%   267KiB
 I/O                           11   14.6ms    0.6%  1.33ms   12.6MiB   67.4%  1.15MiB
   save solution               10   14.4ms    0.6%  1.44ms   12.6MiB   67.2%  1.26MiB
   get element variables       10    178μs    0.0%  17.8μs   23.0KiB    0.1%  2.30KiB
   ~I/O~                       11   21.5μs    0.0%  1.95μs   7.20KiB    0.0%     671B
   save mesh                   10    991ns    0.0%  99.1ns     0.00B    0.0%    0.00B
 ────────────────────────────────────────────────────────────────────────────────────

julia> sol = solve(ode, RDPK3SpFSAL35(), abstol=1.0e-4, reltol=1.0e-4, save_everystep=false, callback=callbacks); summary_callback()
 ────────────────────────────────────────────────────────────────────────────────────
              Trixi.jl                      Time                    Allocations      
                                   ───────────────────────   ────────────────────────
         Tot / % measured:              1.44s /  87.5%           12.3MiB /  90.0%    

 Section                   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────
 rhs!                       2.35k    1.23s   98.1%   525μs   1.91MiB   17.3%     855B
   volume integral          2.35k    978ms   77.7%   416μs     0.00B    0.0%    0.00B
   interface flux           2.35k    135ms   10.7%  57.4μs     0.00B    0.0%    0.00B
   surface integral         2.35k   31.0ms    2.5%  13.2μs     0.00B    0.0%    0.00B
   prolong2interfaces       2.35k   30.3ms    2.4%  12.9μs     0.00B    0.0%    0.00B
   reset ∂u/∂t              2.35k   12.6ms    1.0%  5.37μs     0.00B    0.0%    0.00B
   finish MPI receive       2.35k   11.5ms    0.9%  4.90μs    294KiB    2.6%     128B
   Jacobian                 2.35k   11.2ms    0.9%  4.77μs     0.00B    0.0%    0.00B
   MPI interface flux       2.35k   7.86ms    0.6%  3.35μs     0.00B    0.0%    0.00B
   ~rhs!~                   2.35k   7.16ms    0.6%  3.05μs    969KiB    8.6%     423B
   start MPI send           2.35k   5.48ms    0.4%  2.33μs    220KiB    1.9%    96.0B
   prolong2mpiinterfaces    2.35k   1.91ms    0.2%   813ns     0.00B    0.0%    0.00B
   finish MPI send          2.35k    712μs    0.1%   303ns    330KiB    2.9%     144B
   start MPI receive        2.35k    547μs    0.0%   233ns    147KiB    1.3%    64.0B
   prolong2mortars          2.35k    184μs    0.0%  78.6ns     0.00B    0.0%    0.00B
   prolong2mpimortars       2.35k    161μs    0.0%  68.7ns     0.00B    0.0%    0.00B
   prolong2boundaries       2.35k    154μs    0.0%  65.5ns     0.00B    0.0%    0.00B
   MPI mortar flux          2.35k    120μs    0.0%  51.3ns     0.00B    0.0%    0.00B
   mortar flux              2.35k    109μs    0.0%  46.4ns     0.00B    0.0%    0.00B
   source terms             2.35k   58.0μs    0.0%  24.7ns     0.00B    0.0%    0.00B
   boundary flux            2.35k   47.8μs    0.0%  20.4ns     0.00B    0.0%    0.00B
 analyze solution               6   13.3ms    1.1%  2.21ms   1.56MiB   14.1%   267KiB
 I/O                            7   10.8ms    0.9%  1.54ms   7.58MiB   68.6%  1.08MiB
   save solution                6   10.6ms    0.8%  1.76ms   7.57MiB   68.4%  1.26MiB
   get element variables        6    169μs    0.0%  28.1μs   13.8KiB    0.1%  2.30KiB
   ~I/O~                        7   12.8μs    0.0%  1.83μs   5.20KiB    0.0%     761B
   save mesh                    6    647ns    0.0%   108ns     0.00B    0.0%    0.00B
 ────────────────────────────────────────────────────────────────────────────────────

TL/DR: Looks reasonable

ranocha avatar Apr 01 '22 04:04 ranocha

New results from Rocinante:

julia --project=. --check-bounds=no --threads=24

julia> using Trixi, OrdinaryDiffEq

julia> trixi_include("examples/tree_2d_dgsem/elixir_euler_ec.jl", tspan=(0.0, 10.0),
                     initial_refinement_level=6, save_solution=TrivialCallback())

julia> sol = solve(ode, CarpenterKennedy2N54(williamson_condition=false), dt=1.0, save_everystep=false, callback=callbacks); summary_callback()
 ─────────────────────────────────────────────────────────────────────────────────
            Trixi.jl                     Time                    Allocations      
                                ───────────────────────   ────────────────────────
        Tot / % measured:            8.31s /  44.9%           18.3MiB /  87.6%    

 Section                ncalls     time    %tot     avg     alloc    %tot      avg
 ─────────────────────────────────────────────────────────────────────────────────
 rhs!                    8.77k    3.00s   80.5%   343μs   15.7MiB   98.0%  1.83KiB
   volume integral       8.77k    1.59s   42.5%   181μs   2.41MiB   15.1%     288B
   reset ∂u/∂t           8.77k    883ms   23.6%   101μs     0.00B    0.0%    0.00B
   interface flux        8.77k    289ms    7.7%  33.0μs   3.35MiB   20.9%     400B
   prolong2interfaces    8.77k   92.6ms    2.5%  10.6μs   2.01MiB   12.6%     240B
   surface integral      8.77k   89.8ms    2.4%  10.2μs   2.54MiB   15.9%     304B
   ~rhs!~                8.77k   32.1ms    0.9%  3.66μs   3.09MiB   19.3%     369B
   Jacobian              8.77k   29.6ms    0.8%  3.38μs   2.28MiB   14.2%     272B
   prolong2mortars       8.77k    473μs    0.0%  54.0ns     0.00B    0.0%    0.00B
   prolong2boundaries    8.77k    469μs    0.0%  53.5ns     0.00B    0.0%    0.00B
   mortar flux           8.77k    291μs    0.0%  33.2ns     0.00B    0.0%    0.00B
   boundary flux         8.77k    207μs    0.0%  23.5ns     0.00B    0.0%    0.00B
   source terms          8.77k    205μs    0.0%  23.4ns     0.00B    0.0%    0.00B
 calculate dt            1.75k    554ms   14.8%   316μs     0.00B    0.0%    0.00B
 analyze solution           19    175ms    4.7%  9.22ms    328KiB    2.0%  17.3KiB
 ─────────────────────────────────────────────────────────────────────────────────

julia> sol = solve(ode, CarpenterKennedy2N54(williamson_condition=false, thread=OrdinaryDiffEq.True()), dt=1.0, save_everystep=false, callback=callbacks); summary_callback()
 ─────────────────────────────────────────────────────────────────────────────────
            Trixi.jl                     Time                    Allocations      
                                ───────────────────────   ────────────────────────
        Tot / % measured:            3.56s /  80.3%           19.6MiB /  81.5%    

 Section                ncalls     time    %tot     avg     alloc    %tot      avg
 ─────────────────────────────────────────────────────────────────────────────────
 rhs!                    8.77k    2.13s   74.5%   243μs   15.7MiB   98.0%  1.83KiB
   volume integral       8.77k    1.54s   53.8%   175μs   2.41MiB   15.1%     288B
   interface flux        8.77k    286ms   10.0%  32.6μs   3.35MiB   20.9%     400B
   prolong2interfaces    8.77k    120ms    4.2%  13.7μs   2.01MiB   12.6%     240B
   surface integral      8.77k   87.9ms    3.1%  10.0μs   2.54MiB   15.9%     304B
   reset ∂u/∂t           8.77k   33.9ms    1.2%  3.87μs     0.00B    0.0%    0.00B
   ~rhs!~                8.77k   31.2ms    1.1%  3.55μs   3.09MiB   19.3%     369B
   Jacobian              8.77k   30.8ms    1.1%  3.52μs   2.28MiB   14.2%     272B
   prolong2boundaries    8.77k    486μs    0.0%  55.4ns     0.00B    0.0%    0.00B
   prolong2mortars       8.77k    378μs    0.0%  43.1ns     0.00B    0.0%    0.00B
   mortar flux           8.77k    288μs    0.0%  32.8ns     0.00B    0.0%    0.00B
   boundary flux         8.77k    204μs    0.0%  23.2ns     0.00B    0.0%    0.00B
   source terms          8.77k    199μs    0.0%  22.7ns     0.00B    0.0%    0.00B
 calculate dt            1.75k    555ms   19.4%   316μs     0.00B    0.0%    0.00B
 analyze solution           19    172ms    6.0%  9.07ms    328KiB    2.0%  17.3KiB
 ─────────────────────────────────────────────────────────────────────────────────

julia> sol = solve(ode, RDPK3SpFSAL35(), abstol=1.0e-4, reltol=1.0e-4, save_everystep=false, callback=callbacks); summary_callback()
 ─────────────────────────────────────────────────────────────────────────────────
            Trixi.jl                     Time                    Allocations      
                                ───────────────────────   ────────────────────────
        Tot / % measured:            4.52s /  35.2%           16.6MiB /  51.0%    

 Section                ncalls     time    %tot     avg     alloc    %tot      avg
 ─────────────────────────────────────────────────────────────────────────────────
 rhs!                    4.64k    1.49s   93.7%   322μs   8.29MiB   97.8%  1.83KiB
   volume integral       4.64k    687ms   43.2%   148μs   1.27MiB   15.0%     288B
   reset ∂u/∂t           4.64k    474ms   29.8%   102μs     0.00B    0.0%    0.00B
   interface flux        4.64k    142ms    8.9%  30.7μs   1.77MiB   20.9%     400B
   ~rhs!~                4.64k   62.2ms    3.9%  13.4μs   1.64MiB   19.3%     370B
   prolong2interfaces    4.64k   54.7ms    3.4%  11.8μs   1.06MiB   12.5%     240B
   surface integral      4.64k   50.0ms    3.1%  10.8μs   1.34MiB   15.9%     304B
   Jacobian              4.64k   18.5ms    1.2%  4.00μs   1.20MiB   14.2%     272B
   prolong2mortars       4.64k    672μs    0.0%   145ns     0.00B    0.0%    0.00B
   prolong2boundaries    4.64k    520μs    0.0%   112ns     0.00B    0.0%    0.00B
   mortar flux           4.64k    345μs    0.0%  74.3ns     0.00B    0.0%    0.00B
   source terms          4.64k    127μs    0.0%  27.4ns     0.00B    0.0%    0.00B
   boundary flux         4.64k    108μs    0.0%  23.2ns     0.00B    0.0%    0.00B
 analyze solution           11    101ms    6.3%  9.19ms    189KiB    2.2%  17.2KiB
 ─────────────────────────────────────────────────────────────────────────────────

julia> sol = solve(ode, RDPK3SpFSAL35(thread=OrdinaryDiffEq.True()), abstol=1.0e-4, reltol=1.0e-4, save_everystep=false, callback=callbacks); summary_callback()
 ─────────────────────────────────────────────────────────────────────────────────
            Trixi.jl                     Time                    Allocations      
                                ───────────────────────   ────────────────────────
        Tot / % measured:            2.57s /  44.0%           17.8MiB /  47.7%    

 Section                ncalls     time    %tot     avg     alloc    %tot      avg
 ─────────────────────────────────────────────────────────────────────────────────
 rhs!                    4.64k    1.03s   91.2%   223μs   8.29MiB   97.8%  1.83KiB
   volume integral       4.64k    660ms   58.2%   142μs   1.27MiB   15.0%     288B
   interface flux        4.64k    142ms   12.5%  30.6μs   1.77MiB   20.9%     400B
   reset ∂u/∂t           4.64k   92.8ms    8.2%  20.0μs     0.00B    0.0%    0.00B
   prolong2interfaces    4.64k   62.0ms    5.5%  13.4μs   1.06MiB   12.5%     240B
   surface integral      4.64k   45.4ms    4.0%  9.79μs   1.34MiB   15.9%     304B
   ~rhs!~                4.64k   17.1ms    1.5%  3.69μs   1.64MiB   19.3%     370B
   Jacobian              4.64k   14.4ms    1.3%  3.11μs   1.20MiB   14.2%     272B
   prolong2boundaries    4.64k    238μs    0.0%  51.3ns     0.00B    0.0%    0.00B
   mortar flux           4.64k    189μs    0.0%  40.8ns     0.00B    0.0%    0.00B
   prolong2mortars       4.64k    183μs    0.0%  39.5ns     0.00B    0.0%    0.00B
   boundary flux         4.64k    108μs    0.0%  23.2ns     0.00B    0.0%    0.00B
   source terms          4.64k    105μs    0.0%  22.7ns     0.00B    0.0%    0.00B
 analyze solution           11   99.4ms    8.8%  9.04ms    190KiB    2.2%  17.2KiB
 ─────────────────────────────────────────────────────────────────────────────────

tmpi 2 julia --project=. --check-bounds=no --threads=12

julia> using Trixi, OrdinaryDiffEq

julia> trixi_include("examples/tree_2d_dgsem/elixir_euler_ec.jl", tspan=(0.0, 10.0),
                     initial_refinement_level=6, save_solution=TrivialCallback())

julia> sol = solve(ode, CarpenterKennedy2N54(williamson_condition=false), dt=1.0, save_everystep=false, callback=callbacks); summary_callback()
 ────────────────────────────────────────────────────────────────────────────────────
              Trixi.jl                      Time                    Allocations      
                                   ───────────────────────   ────────────────────────
         Tot / % measured:              5.61s /  58.3%           46.1MiB /  97.2%    

 Section                   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────
 rhs!                       8.77k    2.84s   86.7%   323μs   25.4MiB   56.7%  2.97KiB
   volume integral          8.77k    1.54s   47.2%   176μs   2.81MiB    6.3%     336B
   reset ∂u/∂t              8.77k    415ms   12.7%  47.4μs     0.00B    0.0%    0.00B
   interface flux           8.77k    282ms    8.6%  32.2μs   3.35MiB    7.5%     400B
   finish MPI receive       8.77k    194ms    5.9%  22.1μs   1.07MiB    2.4%     128B
   surface integral         8.77k   94.0ms    2.9%  10.7μs   2.54MiB    5.7%     304B
   start MPI send           8.77k   93.2ms    2.8%  10.6μs    822KiB    1.8%    96.0B
   prolong2interfaces       8.77k   85.0ms    2.6%  9.69μs   2.01MiB    4.5%     240B
   ~rhs!~                   8.77k   33.6ms    1.0%  3.83μs   3.49MiB    7.8%     418B
   MPI interface flux       8.77k   31.2ms    1.0%  3.55μs   3.35MiB    7.5%     400B
   Jacobian                 8.77k   29.2ms    0.9%  3.33μs   2.41MiB    5.4%     288B
   prolong2mpiinterfaces    8.77k   27.1ms    0.8%  3.09μs   1.87MiB    4.2%     224B
   finish MPI send          8.77k   2.14ms    0.1%   244ns   1.20MiB    2.7%     144B
   start MPI receive        8.77k   1.89ms    0.1%   216ns    548KiB    1.2%    64.0B
   prolong2boundaries       8.77k    547μs    0.0%  62.4ns     0.00B    0.0%    0.00B
   prolong2mpimortars       8.77k    401μs    0.0%  45.7ns     0.00B    0.0%    0.00B
   prolong2mortars          8.77k    388μs    0.0%  44.3ns     0.00B    0.0%    0.00B
   MPI mortar flux          8.77k    368μs    0.0%  42.0ns     0.00B    0.0%    0.00B
   mortar flux              8.77k    287μs    0.0%  32.7ns     0.00B    0.0%    0.00B
   source terms             8.77k    203μs    0.0%  23.2ns     0.00B    0.0%    0.00B
   boundary flux            8.77k    201μs    0.0%  22.9ns     0.00B    0.0%    0.00B
 calculate dt               1.75k    335ms   10.2%   191μs    165KiB    0.4%    96.0B
 analyze solution              19    101ms    3.1%  5.29ms   19.2MiB   42.9%  1.01MiB
 ────────────────────────────────────────────────────────────────────────────────────

julia> sol = solve(ode, CarpenterKennedy2N54(williamson_condition=false, thread=OrdinaryDiffEq.True()), dt=1.0, save_everystep=false, callback=callbacks); summary_callback()
 ────────────────────────────────────────────────────────────────────────────────────
              Trixi.jl                      Time                    Allocations      
                                   ───────────────────────   ────────────────────────
         Tot / % measured:              3.30s /  82.8%           48.5MiB /  92.5%    

 Section                   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────
 rhs!                       8.77k    2.33s   85.5%   266μs   25.4MiB   56.7%  2.97KiB
   volume integral          8.77k    1.53s   56.2%   175μs   2.81MiB    6.3%     336B
   interface flux           8.77k    297ms   10.9%  33.9μs   3.35MiB    7.5%     400B
   prolong2interfaces       8.77k    105ms    3.9%  12.0μs   2.01MiB    4.5%     240B
   finish MPI receive       8.77k   98.0ms    3.6%  11.2μs   1.07MiB    2.4%     128B
   surface integral         8.77k   86.5ms    3.2%  9.87μs   2.54MiB    5.7%     304B
   start MPI send           8.77k   62.5ms    2.3%  7.12μs    822KiB    1.8%    96.0B
   ~rhs!~                   8.77k   33.9ms    1.2%  3.86μs   3.49MiB    7.8%     418B
   MPI interface flux       8.77k   33.0ms    1.2%  3.77μs   3.35MiB    7.5%     400B
   Jacobian                 8.77k   28.2ms    1.0%  3.22μs   2.41MiB    5.4%     288B
   reset ∂u/∂t              8.77k   27.1ms    1.0%  3.09μs     0.00B    0.0%    0.00B
   prolong2mpiinterfaces    8.77k   20.6ms    0.8%  2.35μs   1.87MiB    4.2%     224B
   finish MPI send          8.77k   2.44ms    0.1%   279ns   1.20MiB    2.7%     144B
   start MPI receive        8.77k   1.81ms    0.1%   207ns    548KiB    1.2%    64.0B
   prolong2boundaries       8.77k    404μs    0.0%  46.0ns     0.00B    0.0%    0.00B
   prolong2mortars          8.77k    380μs    0.0%  43.3ns     0.00B    0.0%    0.00B
   MPI mortar flux          8.77k    341μs    0.0%  38.9ns     0.00B    0.0%    0.00B
   prolong2mpimortars       8.77k    341μs    0.0%  38.8ns     0.00B    0.0%    0.00B
   mortar flux              8.77k    250μs    0.0%  28.5ns     0.00B    0.0%    0.00B
   source terms             8.77k    203μs    0.0%  23.1ns     0.00B    0.0%    0.00B
   boundary flux            8.77k    201μs    0.0%  22.9ns     0.00B    0.0%    0.00B
 calculate dt               1.75k    295ms   10.8%   168μs    165KiB    0.4%    96.0B
 analyze solution              19    101ms    3.7%  5.32ms   19.2MiB   42.9%  1.01MiB
 ────────────────────────────────────────────────────────────────────────────────────

julia> sol = solve(ode, RDPK3SpFSAL35(), abstol=1.0e-4, reltol=1.0e-4, save_everystep=false, callback=callbacks); summary_callback()
 ────────────────────────────────────────────────────────────────────────────────────
              Trixi.jl                      Time                    Allocations      
                                   ───────────────────────   ────────────────────────
         Tot / % measured:              3.01s /  45.0%           29.0MiB /  84.7%    

 Section                   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────
 rhs!                       4.64k    1.30s   95.7%   280μs   13.5MiB   54.7%  2.97KiB
   volume integral          4.64k    677ms   50.0%   146μs   1.49MiB    6.0%     336B
   reset ∂u/∂t              4.64k    233ms   17.2%  50.3μs     0.00B    0.0%    0.00B
   interface flux           4.64k    137ms   10.1%  29.6μs   1.77MiB    7.2%     400B
   surface integral         4.64k   47.6ms    3.5%  10.3μs   1.34MiB    5.5%     304B
   finish MPI receive       4.64k   47.0ms    3.5%  10.1μs    580KiB    2.3%     128B
   prolong2interfaces       4.64k   45.1ms    3.3%  9.72μs   1.06MiB    4.3%     240B
   start MPI send           4.64k   44.5ms    3.3%  9.60μs    435KiB    1.7%    96.0B
   ~rhs!~                   4.64k   18.2ms    1.3%  3.92μs   1.85MiB    7.5%     419B
   MPI interface flux       4.64k   15.9ms    1.2%  3.43μs   1.77MiB    7.2%     400B
   Jacobian                 4.64k   15.1ms    1.1%  3.26μs   1.27MiB    5.2%     288B
   prolong2mpiinterfaces    4.64k   13.1ms    1.0%  2.82μs   0.99MiB    4.0%     224B
   start MPI receive        4.64k   1.09ms    0.1%   235ns    290KiB    1.2%    64.0B
   finish MPI send          4.64k    982μs    0.1%   212ns    652KiB    2.6%     144B
   prolong2boundaries       4.64k    284μs    0.0%  61.3ns     0.00B    0.0%    0.00B
   prolong2mpimortars       4.64k    238μs    0.0%  51.2ns     0.00B    0.0%    0.00B
   prolong2mortars          4.64k    217μs    0.0%  46.8ns     0.00B    0.0%    0.00B
   MPI mortar flux          4.64k    196μs    0.0%  42.4ns     0.00B    0.0%    0.00B
   mortar flux              4.64k    140μs    0.0%  30.1ns     0.00B    0.0%    0.00B
   boundary flux            4.64k    118μs    0.0%  25.4ns     0.00B    0.0%    0.00B
   source terms             4.64k    112μs    0.0%  24.0ns     0.00B    0.0%    0.00B
 analyze solution              11   57.7ms    4.3%  5.25ms   11.1MiB   45.3%  1.01MiB
 ────────────────────────────────────────────────────────────────────────────────────

julia> sol = solve(ode, RDPK3SpFSAL35(thread=OrdinaryDiffEq.True()), abstol=1.0e-4, reltol=1.0e-4, save_everystep=false, callback=callbacks); summary_callback()
 ────────────────────────────────────────────────────────────────────────────────────
              Trixi.jl                      Time                    Allocations      
                                   ───────────────────────   ────────────────────────
         Tot / % measured:              2.17s /  55.6%           31.0MiB /  79.2%    

 Section                   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────
 rhs!                       4.64k    1.11s   92.2%   240μs   13.5MiB   54.7%  2.97KiB
   volume integral          4.64k    662ms   54.9%   143μs   1.49MiB    6.0%     336B
   interface flux           4.64k    135ms   11.2%  29.1μs   1.77MiB    7.2%     400B
   finish MPI receive       4.64k   57.1ms    4.7%  12.3μs    580KiB    2.3%     128B
   reset ∂u/∂t              4.64k   56.8ms    4.7%  12.2μs     0.00B    0.0%    0.00B
   prolong2interfaces       4.64k   56.7ms    4.7%  12.2μs   1.06MiB    4.3%     240B
   surface integral         4.64k   48.3ms    4.0%  10.4μs   1.34MiB    5.5%     304B
   start MPI send           4.64k   32.3ms    2.7%  6.97μs    435KiB    1.7%    96.0B
   ~rhs!~                   4.64k   17.5ms    1.5%  3.78μs   1.85MiB    7.5%     419B
   MPI interface flux       4.64k   15.7ms    1.3%  3.38μs   1.77MiB    7.2%     400B
   Jacobian                 4.64k   15.2ms    1.3%  3.28μs   1.27MiB    5.2%     288B
   prolong2mpiinterfaces    4.64k   11.2ms    0.9%  2.42μs   0.99MiB    4.0%     224B
   finish MPI send          4.64k   1.35ms    0.1%   292ns    652KiB    2.6%     144B
   start MPI receive        4.64k    919μs    0.1%   198ns    290KiB    1.2%    64.0B
   prolong2boundaries       4.64k    226μs    0.0%  48.7ns     0.00B    0.0%    0.00B
   prolong2mpimortars       4.64k    215μs    0.0%  46.4ns     0.00B    0.0%    0.00B
   prolong2mortars          4.64k    203μs    0.0%  43.7ns     0.00B    0.0%    0.00B
   MPI mortar flux          4.64k    199μs    0.0%  42.9ns     0.00B    0.0%    0.00B
   mortar flux              4.64k    137μs    0.0%  29.6ns     0.00B    0.0%    0.00B
   source terms             4.64k    111μs    0.0%  24.0ns     0.00B    0.0%    0.00B
   boundary flux            4.64k    106μs    0.0%  22.8ns     0.00B    0.0%    0.00B
 analyze solution              11   93.9ms    7.8%  8.54ms   11.1MiB   45.3%  1.01MiB
 ────────────────────────────────────────────────────────────────────────────────────

Looks okay, doesn't it? In particular, there seems to be an effect of using multi-threading also for the RK solver.

ranocha avatar Apr 01 '22 13:04 ranocha

Looks okay, doesn't it? In particular, there seems to be an effect of using multi-threading also for the RK solver.

Yes, it looks ok. Although it's not clear yet what the performance impact really is (hard to tell with such a small problem size) and whether it makes more sense to use more threads or more ranks. Then again, this is often hardware dependent...

sloede avatar Apr 01 '22 14:04 sloede

Looks okay, doesn't it? In particular, there seems to be an effect of using multi-threading also for the RK solver.

Yes, it looks ok. Although it's not clear yet what the performance impact really is (hard to tell with such a small problem size) and whether it makes more sense to use more threads or more ranks. Then again, this is often hardware dependent...

My intention was just to test whether it works at all - I'll leave the rest to you HLRS guys :sweat_smile:

ranocha avatar Apr 01 '22 15:04 ranocha

Do you understand why the serial p4est runs fail? Why would the results change? Is it because we do not use raw PtrArrays anymore and thus OrdinaryDiffEq.jl does something different under the hood when computing the time step update?

sloede avatar Apr 03 '22 07:04 sloede

Do you understand why the serial p4est runs fail? Why would the results change? Is it because we do not use raw PtrArrays anymore and thus OrdinaryDiffEq.jl does something different under the hood when computing the time step update?

No idea... It's elixir_advection_basic.jl, everything else passes :confused:

ranocha avatar Apr 03 '22 08:04 ranocha

Do you understand why the serial p4est runs fail? Why would the results change? Is it because we do not use raw PtrArrays anymore and thus OrdinaryDiffEq.jl does something different under the hood when computing the time step update?

No idea... It's elixir_advection_basic.jl, everything else passes 😕

Positive: Now everything that was "weirdly" broken passes. Negative: macOS tests are still hanging...

sloede avatar Apr 03 '22 11:04 sloede

Yeah... but I can't really debug the macOS part (since I don't have a Mac)

ranocha avatar Apr 03 '22 13:04 ranocha

Could you see which test is the issue? If yes, we can try disabling it to check whether it's a singleton issue or a general problem. Although we should try to find the root cause either way.

sloede avatar Apr 03 '22 13:04 sloede

Could you see which test is the issue? If yes, we can try disabling it to check whether it's a singleton issue or a general problem. Although we should try to find the root cause either way.

Looks like it's examples/tree_2d_dgsem/elixir_euler_ec.jl with error-based step size control :cry:

ranocha avatar Apr 03 '22 14:04 ranocha

Could you see which test is the issue? If yes, we can try disabling it to check whether it's a singleton issue or a general problem. Although we should try to find the root cause either way.

Looks like it's examples/tree_2d_dgsem/elixir_euler_ec.jl with error-based step size control 😢

@andrewwinters5000 It would be great if you could try to reproduce this issue.

sloede avatar Apr 03 '22 20:04 sloede

I got rid of the global length completely, since it leads to hard-to-find bugs. Let's see what happens now...

ranocha avatar Apr 04 '22 09:04 ranocha

MPI tests pass :partying_face: @sloede Please have a look at the new stuff. Right now, our calling convention must be

sol = solve(ode, alg; kwargs..., internalnorm=ode_norm, unstable_check=ode_unstable_check)

We should probably make it easier to use all this but it seems to be working.

ranocha avatar Apr 04 '22 12:04 ranocha