parthenon icon indicating copy to clipboard operation
parthenon copied to clipboard

WIP: Add AMR performance test

Open pgrete opened this issue 3 years ago • 4 comments

PR Summary

This test reflects the test case we used during the hackathon, i.e., 32^3 blocks with the following "evolution":

cycle=0 time=0.0000000000000000e+00 dt=8.7890624999999991e-04 zone-cycles/wsec_step=0.00e+00 wsec_total=5.54e-03 wsec_step=5.47e+00 zone-cycles/wsec=0.00e+00 wsec_AMR=0.00e+00
---------------------- Current Mesh structure ----------------------
Root grid = 4 x 4 x 4 MeshBlocks
Total number of MeshBlocks = 232
Number of physical refinement levels = 3
Number of logical  refinement levels = 5
  Physical level = 0 (logical level = 2): 56 MeshBlocks, cost = 56
  Physical level = 1 (logical level = 3): 56 MeshBlocks, cost = 56
  Physical level = 2 (logical level = 4): 56 MeshBlocks, cost = 56
  Physical level = 3 (logical level = 5): 64 MeshBlocks, cost = 64
--------------------------------------------------------------------
cycle=1 time=8.7890624999999991e-04 dt=8.7890624999999991e-04 zone-cycles/wsec_step=8.69e+06 wsec_total=8.81e-01 wsec_step=8.75e-01 zone-cycles/wsec=8.69e+06 wsec_AMR=2.03e-05
cycle=2 time=1.7578124999999998e-03 dt=8.7890624999999991e-04 zone-cycles/wsec_step=1.20e+07 wsec_total=1.52e+00 wsec_step=6.35e-01 zone-cycles/wsec=1.20e+07 wsec_AMR=1.14e-05
cycle=3 time=2.6367187499999997e-03 dt=8.7890624999999991e-04 zone-cycles/wsec_step=1.20e+07 wsec_total=2.15e+00 wsec_step=6.33e-01 zone-cycles/wsec=1.20e+07 wsec_AMR=6.23e-06
cycle=4 time=3.5156249999999997e-03 dt=8.7890624999999991e-04 zone-cycles/wsec_step=1.21e+07 wsec_total=2.78e+00 wsec_step=6.29e-01 zone-cycles/wsec=1.21e+07 wsec_AMR=4.82e-06
cycle=5 time=4.3945312500000000e-03 dt=8.7890624999999991e-04 zone-cycles/wsec_step=1.17e+07 wsec_total=3.43e+00 wsec_step=6.48e-01 zone-cycles/wsec=1.17e+07 wsec_AMR=1.79e-05
cycle=6 time=5.2734375000000003e-03 dt=8.7890624999999991e-04 zone-cycles/wsec_step=1.20e+07 wsec_total=4.06e+00 wsec_step=6.33e-01 zone-cycles/wsec=1.20e+07 wsec_AMR=4.91e-06
cycle=7 time=6.1523437500000007e-03 dt=8.7890624999999991e-04 zone-cycles/wsec_step=1.21e+07 wsec_total=9.03e+00 wsec_step=6.30e-01 zone-cycles/wsec=1.53e+06 wsec_AMR=4.34e+00
-------------- New Mesh structure after (de)refinement -------------
Root grid = 4 x 4 x 4 MeshBlocks
Total number of MeshBlocks = 484
Number of physical refinement levels = 3
Number of logical  refinement levels = 5
  Physical level = 0 (logical level = 2): 44 MeshBlocks, cost = 44
  Physical level = 1 (logical level = 3): 140 MeshBlocks, cost = 140
  Physical level = 2 (logical level = 4): 140 MeshBlocks, cost = 140
  Physical level = 3 (logical level = 5): 160 MeshBlocks, cost = 160
--------------------------------------------------------------------
cycle=8 time=7.0312500000000010e-03 dt=8.7890624999999991e-04 zone-cycles/wsec_step=9.08e+06 wsec_total=1.08e+01 wsec_step=1.75e+00 zone-cycles/wsec=9.08e+06 wsec_AMR=1.80e-05
cycle=9 time=7.9101562500000014e-03 dt=8.7890624999999991e-04 zone-cycles/wsec_step=1.17e+07 wsec_total=1.21e+01 wsec_step=1.35e+00 zone-cycles/wsec=1.17e+07 wsec_AMR=1.86e-05
cycle=10 time=8.7890625000000017e-03 dt=8.7890624999999991e-04 zone-cycles/wsec_step=1.17e+07 wsec_total=1.35e+01 wsec_step=1.35e+00 zone-cycles/wsec=1.17e+07 wsec_AMR=1.88e-05

The resulting plot currently looks like:

amr_performance

(unfortunately the IAS CI machine has "no space left on device", which I cannot fix directly).

Right now, the all the kernel and region data is also collected (i.e., there will be json files in the output directory) but not being processed. I'll add that next week along with a little doc (especially with respect to the Kokkos profiling tools that are now required here).

Nevertheless, I'm already happy to get some early feedback. Wanted to get this out also for comparing performance of #508

Finally, already thinking ahead @JoshuaSBrown I used the following script to create the plot in #508 to compare the performance across results taken from different runs and could imagine sth along those lines being integrated in the performance app (also including CPU runs).



path_commits = [
    ('/mnt/home/gretephi/tmp-perf-data', 'develop'),
    ('/mnt/home/gretephi/src/parthenon-fork/build-cuda/tst/regression/outputs/amr_performance', 'unify restrict.')
]

num_rows = 4
num_cols = 3
fig, p = plt.subplots(num_rows, num_cols, sharex=True, sharey="row", figsize=(10,10))

for path, commit in path_commits:

    for i, (step, lbl) in enumerate([(1, "9 var. with 1 component "),
                                     (3, "3 var. with 3 components"),
                                     (5, "1 var. with 9 components"),
                       ]):
        with open(path + "/step_%d.out" % (step), "rb") as infile:
            cycle_all = []
            for line in infile.readlines():
                line = line.decode("utf-8")
                # sample output:
                # cycle=3 time=2.6367187499999997e-03 dt=8.7890624999999991e-04 zone-cycles/wsec_step=9.16e+07 wsec_total=4.00e-01 wsec_step=8.30e-02 zone-cycles/wsec=9.16e+07 wsec_AMR=3.60e-06
                if "cycle=" == line[:6]:
                    cycle_current = []
                    for vals in line.split(" "):
                        cycle_current.append(float(vals.split("=")[-1]))
                    cycle_all.append(cycle_current)

        # convert to array and skip cycle cycle 0
        cycle_all = np.array(cycle_all)[1:,:]

        label = commit + ": tot wtime %.2f" % cycle_all[-1,4]
        p[0, i].plot(cycle_all[:,0], cycle_all[:,3], label=label)
        p[1, i].plot(cycle_all[:,0], cycle_all[:,5])
        p[2, i].plot(cycle_all[:,0], cycle_all[:,6])
        p[3, i].plot(cycle_all[:,0], cycle_all[:,7])
        
        p[0, i].set_title(lbl)

p[0,0].set_ylabel("zone-cycles/wsec_step")
p[1,0].set_ylabel("wsec_step")
p[2,0].set_ylabel("zone-cycles/wsec")
p[3,0].set_ylabel("wsec_AMR")

for j in range(num_cols):
    p[-1,j].set_xlabel("cycle #")
    p[0,j].legend(fontsize=8)
    for i in range(num_rows):
        p[i,j].grid()

fig.tight_layout()

PR Checklist

  • [x] Code passes cpplint
  • [ ] New features are documented.
  • [ ] Adds a test for any bugs fixed. Adds tests for new features.
  • [x] Code is formatted
  • [ ] Changes are summarized in CHANGELOG.md
  • [ ] CI has been triggered on Darwin for performance regression tests.
  • [ ] (@lanl.gov employees) Update copyright on changed files

pgrete avatar May 28 '21 16:05 pgrete

This is great @pgrete! I'll be really glad to see how this does once we get it implemented.

JoshuaSBrown avatar May 29 '21 03:05 JoshuaSBrown

@pgrete Is ready for review?

jlippuner avatar Aug 05 '21 17:08 jlippuner

I believe this is way out of date and the relevant testing has long since been merged after the hackathon. Should we just close this and delete the branch, @pgrete ?

Yurlungur avatar Apr 14 '23 23:04 Yurlungur

IIRC we still don't have "AMR performance" tests. Most of the code is still valid so I'd like to keep it for now. Depending on where people see this in terms of priorities, I could move this further up on my todo list.

pgrete avatar Apr 17 '23 14:04 pgrete