OceanBioME.jl icon indicating copy to clipboard operation
OceanBioME.jl copied to clipboard

Compare GPU and CPU runs

Open MarionBWeinzierl opened this issue 5 months ago • 3 comments

GPU runs of the code base have memory limitations. It would be interesting to compare profiling results from GPU runs to those from CPU runs, and see whether anything useful can be learnt from that comparison.

MarionBWeinzierl avatar Jul 08 '25 13:07 MarionBWeinzierl

This preprint might be of general interest https://arxiv.org/pdf/2502.14148

And this discusses the performance of the physics: https://doi.org/10.1029/2024MS004465 - as it eludes to there hasn't been much focus on optimising Oceanaingans for CPU

jagoosw avatar Jul 08 '25 18:07 jagoosw

I ran @johnryantaylor's LOBSTER_3D.jl script. Here is the summary for both CPU and GPU runs. To start with, this can give us insight where to look at

CPU run:

[ Info: Initializing simulation...
i:      0, sim time:  0 seconds, wall time:  0 seconds, Δt: 17.188 minutes, CFL: 3.27e-01
[ Info:     ... simulation initialization complete (2.172 minutes)
[ Info: Executing initial time step...
[ Info:     ... initial time step complete (32.920 seconds).
i:     20, sim time:    6 hours, wall time: 2.773 minutes, Δt: 25.164 minutes, CFL: 4.74e-01
i:     40, sim time:   15 hours, wall time: 2.808 minutes, Δt: 30 minutes, CFL: 5.75e-01
i:     60, sim time: 1.042 days, wall time: 2.842 minutes, Δt: 30 minutes, CFL: 6.05e-01
i:     80, sim time: 1.458 days, wall time: 2.876 minutes, Δt: 30 minutes, CFL: 6.28e-01
i:    100, sim time: 1.875 days, wall time: 2.911 minutes, Δt: 30 minutes, CFL: 6.60e-01
i:    120, sim time: 2.292 days, wall time: 2.945 minutes, Δt: 30 minutes, CFL: 7.28e-01
i:    140, sim time: 2.642 days, wall time: 2.978 minutes, Δt: 27.492 minutes, CFL: 7.50e-01
i:    160, sim time: 2.952 days, wall time: 3.014 minutes, Δt: 24.261 minutes, CFL: 7.50e-01
i:    180, sim time: 3.212 days, wall time: 3.047 minutes, Δt: 21.015 minutes, CFL: 7.50e-01
i:    200, sim time: 3.455 days, wall time: 3.080 minutes, Δt: 18.161 minutes, CFL: 7.50e-01
i:    220, sim time: 3.678 days, wall time: 3.112 minutes, Δt: 16.496 minutes, CFL: 7.50e-01
i:    240, sim time: 3.879 days, wall time: 3.145 minutes, Δt: 15.970 minutes, CFL: 7.50e-01
i:    260, sim time: 4.083 days, wall time: 3.177 minutes, Δt: 14.784 minutes, CFL: 7.50e-01
i:    280, sim time: 4.270 days, wall time: 3.210 minutes, Δt: 14.334 minutes, CFL: 7.50e-01
i:    300, sim time: 4.455 days, wall time: 3.242 minutes, Δt: 13.436 minutes, CFL: 7.50e-01
i:    320, sim time: 4.637 days, wall time: 3.274 minutes, Δt: 12.716 minutes, CFL: 7.50e-01
i:    340, sim time: 4.800 days, wall time: 3.307 minutes, Δt: 11.968 minutes, CFL: 7.50e-01
i:    360, sim time: 4.949 days, wall time: 3.340 minutes, Δt: 11.609 minutes, CFL: 7.50e-01
i:    380, sim time: 5.100 days, wall time: 3.372 minutes, Δt: 11.238 minutes, CFL: 7.50e-01
i:    400, sim time: 5.245 days, wall time: 3.404 minutes, Δt: 9.863 minutes, CFL: 7.50e-01
i:    420, sim time: 5.374 days, wall time: 3.436 minutes, Δt: 9.743 minutes, CFL: 7.50e-01
i:    440, sim time: 5.500 days, wall time: 3.468 minutes, Δt: 9.838 minutes, CFL: 7.50e-01
i:    460, sim time: 5.631 days, wall time: 3.501 minutes, Δt: 9.552 minutes, CFL: 7.50e-01
i:    480, sim time: 5.750 days, wall time: 3.533 minutes, Δt: 8.801 minutes, CFL: 7.50e-01
i:    500, sim time: 5.871 days, wall time: 3.565 minutes, Δt: 9.062 minutes, CFL: 7.50e-01
i:    520, sim time: 5.993 days, wall time: 3.597 minutes, Δt: 9.354 minutes, CFL: 7.50e-01
i:    540, sim time: 6.116 days, wall time: 3.629 minutes, Δt: 9.563 minutes, CFL: 7.50e-01
i:    560, sim time: 6.249 days, wall time: 3.661 minutes, Δt: 10.210 minutes, CFL: 7.50e-01
i:    580, sim time: 6.383 days, wall time: 3.694 minutes, Δt: 10.299 minutes, CFL: 7.50e-01
i:    600, sim time: 6.522 days, wall time: 3.726 minutes, Δt: 10.573 minutes, CFL: 7.50e-01
i:    620, sim time: 6.667 days, wall time: 3.758 minutes, Δt: 10.944 minutes, CFL: 7.50e-01
i:    640, sim time: 6.802 days, wall time: 3.790 minutes, Δt: 10.658 minutes, CFL: 7.50e-01
i:    660, sim time: 6.938 days, wall time: 3.823 minutes, Δt: 10.247 minutes, CFL: 7.50e-01
i:    680, sim time: 7.079 days, wall time: 3.854 minutes, Δt: 10.487 minutes, CFL: 7.50e-01
i:    700, sim time: 7.214 days, wall time: 3.886 minutes, Δt: 9.901 minutes, CFL: 7.50e-01
i:    720, sim time: 7.347 days, wall time: 3.919 minutes, Δt: 10.013 minutes, CFL: 7.50e-01
i:    740, sim time: 7.478 days, wall time: 3.950 minutes, Δt: 9.772 minutes, CFL: 7.50e-01
i:    760, sim time: 7.596 days, wall time: 3.982 minutes, Δt: 9.381 minutes, CFL: 7.50e-01
i:    780, sim time: 7.726 days, wall time: 4.014 minutes, Δt: 9.478 minutes, CFL: 7.50e-01
i:    800, sim time: 7.853 days, wall time: 4.046 minutes, Δt: 9.238 minutes, CFL: 7.50e-01
i:    820, sim time: 7.979 days, wall time: 4.078 minutes, Δt: 9.083 minutes, CFL: 7.50e-01
i:    840, sim time: 8.090 days, wall time: 4.110 minutes, Δt: 9.259 minutes, CFL: 7.50e-01
i:    860, sim time: 8.219 days, wall time: 4.142 minutes, Δt: 9.746 minutes, CFL: 7.50e-01
i:    880, sim time: 8.355 days, wall time: 4.226 minutes, Δt: 10.690 minutes, CFL: 7.50e-01
i:    900, sim time: 8.500 days, wall time: 4.258 minutes, Δt: 11.465 minutes, CFL: 7.50e-01
i:    920, sim time: 8.646 days, wall time: 4.290 minutes, Δt: 11.268 minutes, CFL: 7.50e-01
i:    940, sim time: 8.795 days, wall time: 4.322 minutes, Δt: 10.565 minutes, CFL: 7.50e-01
i:    960, sim time: 8.931 days, wall time: 4.355 minutes, Δt: 10.409 minutes, CFL: 7.50e-01
i:    980, sim time: 9.072 days, wall time: 4.387 minutes, Δt: 10.318 minutes, CFL: 7.50e-01
i:   1000, sim time: 9.210 days, wall time: 4.419 minutes, Δt: 10.603 minutes, CFL: 7.50e-01
i:   1020, sim time: 9.341 days, wall time: 4.451 minutes, Δt: 10.801 minutes, CFL: 7.50e-01
i:   1040, sim time: 9.486 days, wall time: 4.483 minutes, Δt: 10.937 minutes, CFL: 7.50e-01
i:   1060, sim time: 9.636 days, wall time: 4.515 minutes, Δt: 10.595 minutes, CFL: 7.50e-01
i:   1080, sim time: 9.771 days, wall time: 4.547 minutes, Δt: 10.179 minutes, CFL: 7.50e-01
i:   1100, sim time: 9.910 days, wall time: 4.578 minutes, Δt: 9.959 minutes, CFL: 7.50e-01
[ Info: Simulation is stopping after running for 4.601 minutes.
[ Info: Simulation time 10 days equals or exceeds stop time 10 days.

[ Info: Saved animation to /rds/user/ab3191/hpc-work/software/OceanBioME.jl/profiling/LOBSTER.mp4

real    9m39.587s
user    8m39.956s
sys     0m7.836s

GPU run:

[ Info: Initializing simulation...
i:      0, sim time:  0 seconds, wall time:  0 seconds, Δt: 17.188 minutes, CFL: 3.27e-01
[ Info:     ... simulation initialization complete (8.956 minutes)
[ Info: Executing initial time step...
[ Info:     ... initial time step complete (50.104 seconds).
i:     20, sim time:    6 hours, wall time: 9.822 minutes, Δt: 25.164 minutes, CFL: 4.70e-01
i:     40, sim time:   15 hours, wall time: 9.830 minutes, Δt: 30 minutes, CFL: 5.79e-01
i:     60, sim time: 1.042 days, wall time: 9.839 minutes, Δt: 30 minutes, CFL: 6.02e-01
i:     80, sim time: 1.458 days, wall time: 9.847 minutes, Δt: 30 minutes, CFL: 6.24e-01
i:    100, sim time: 1.875 days, wall time: 9.856 minutes, Δt: 30 minutes, CFL: 6.59e-01
i:    120, sim time: 2.292 days, wall time: 9.864 minutes, Δt: 30 minutes, CFL: 7.37e-01
i:    140, sim time: 2.667 days, wall time: 9.872 minutes, Δt: 28.759 minutes, CFL: 7.50e-01
i:    160, sim time:     3 days, wall time: 9.880 minutes, Δt: 22.670 minutes, CFL: 7.50e-01
i:    180, sim time: 3.264 days, wall time: 9.888 minutes, Δt: 18.783 minutes, CFL: 7.50e-01
i:    200, sim time: 3.500 days, wall time: 9.895 minutes, Δt: 16.632 minutes, CFL: 7.50e-01
i:    220, sim time: 3.698 days, wall time: 9.902 minutes, Δt: 14.969 minutes, CFL: 7.50e-01
i:    240, sim time: 3.892 days, wall time: 9.909 minutes, Δt: 13.606 minutes, CFL: 7.50e-01
i:    260, sim time: 4.073 days, wall time: 9.916 minutes, Δt: 12.738 minutes, CFL: 7.50e-01
i:    280, sim time: 4.237 days, wall time: 9.922 minutes, Δt: 12.460 minutes, CFL: 7.50e-01
i:    300, sim time: 4.402 days, wall time: 9.929 minutes, Δt: 12.642 minutes, CFL: 7.50e-01
i:    320, sim time: 4.571 days, wall time: 9.936 minutes, Δt: 12.944 minutes, CFL: 7.50e-01
i:    340, sim time: 4.742 days, wall time: 9.942 minutes, Δt: 13.443 minutes, CFL: 7.50e-01
i:    360, sim time: 4.926 days, wall time: 9.950 minutes, Δt: 13.946 minutes, CFL: 7.50e-01
i:    380, sim time: 5.112 days, wall time: 9.957 minutes, Δt: 14.038 minutes, CFL: 7.50e-01
i:    400, sim time: 5.296 days, wall time: 9.963 minutes, Δt: 13.134 minutes, CFL: 7.50e-01
i:    420, sim time: 5.461 days, wall time: 9.970 minutes, Δt: 13.004 minutes, CFL: 7.50e-01
i:    440, sim time: 5.639 days, wall time: 9.977 minutes, Δt: 13.459 minutes, CFL: 7.50e-01
i:    460, sim time: 5.825 days, wall time: 9.984 minutes, Δt: 13.259 minutes, CFL: 7.50e-01
i:    480, sim time: 5.998 days, wall time: 9.991 minutes, Δt: 12.949 minutes, CFL: 7.50e-01
i:    500, sim time: 6.161 days, wall time: 9.998 minutes, Δt: 12.405 minutes, CFL: 7.50e-01
i:    520, sim time: 6.327 days, wall time: 10.005 minutes, Δt: 12.132 minutes, CFL: 7.50e-01
i:    540, sim time: 6.492 days, wall time: 10.012 minutes, Δt: 11.912 minutes, CFL: 7.50e-01
i:    560, sim time: 6.649 days, wall time: 10.019 minutes, Δt: 11.516 minutes, CFL: 7.50e-01
i:    580, sim time: 6.794 days, wall time: 10.026 minutes, Δt: 10.277 minutes, CFL: 7.50e-01
i:    600, sim time: 6.931 days, wall time: 10.033 minutes, Δt: 10.152 minutes, CFL: 7.50e-01
i:    620, sim time: 7.070 days, wall time: 10.039 minutes, Δt: 10.159 minutes, CFL: 7.50e-01
i:    640, sim time: 7.209 days, wall time: 10.046 minutes, Δt: 10.153 minutes, CFL: 7.50e-01
i:    660, sim time: 7.347 days, wall time: 10.053 minutes, Δt: 9.919 minutes, CFL: 7.50e-01
i:    680, sim time: 7.477 days, wall time: 10.059 minutes, Δt: 9.501 minutes, CFL: 7.50e-01
i:    700, sim time: 7.602 days, wall time: 10.066 minutes, Δt: 9.139 minutes, CFL: 7.50e-01
i:    720, sim time: 7.717 days, wall time: 10.072 minutes, Δt: 9.082 minutes, CFL: 7.50e-01
i:    740, sim time: 7.832 days, wall time: 10.078 minutes, Δt: 7.954 minutes, CFL: 7.50e-01
i:    760, sim time: 7.933 days, wall time: 10.086 minutes, Δt: 8.031 minutes, CFL: 7.50e-01
i:    780, sim time: 8.039 days, wall time: 10.092 minutes, Δt: 7.964 minutes, CFL: 7.50e-01
i:    800, sim time: 8.144 days, wall time: 10.098 minutes, Δt: 7.978 minutes, CFL: 7.50e-01
i:    820, sim time: 8.250 days, wall time: 10.105 minutes, Δt: 8.060 minutes, CFL: 7.50e-01
i:    840, sim time: 8.361 days, wall time: 10.112 minutes, Δt: 8.050 minutes, CFL: 7.50e-01
i:    860, sim time: 8.471 days, wall time: 10.118 minutes, Δt: 7.687 minutes, CFL: 7.50e-01
i:    880, sim time: 8.573 days, wall time: 10.124 minutes, Δt: 7.569 minutes, CFL: 7.50e-01
i:    900, sim time: 8.678 days, wall time: 10.131 minutes, Δt: 8.003 minutes, CFL: 7.50e-01
i:    920, sim time: 8.783 days, wall time: 10.137 minutes, Δt: 8.073 minutes, CFL: 7.50e-01
i:    940, sim time: 8.895 days, wall time: 10.144 minutes, Δt: 8.170 minutes, CFL: 7.50e-01
i:    960, sim time: 9.006 days, wall time: 10.151 minutes, Δt: 8.690 minutes, CFL: 7.50e-01
i:    980, sim time: 9.129 days, wall time: 10.157 minutes, Δt: 9.193 minutes, CFL: 7.50e-01
i:   1000, sim time: 9.257 days, wall time: 10.164 minutes, Δt: 10.098 minutes, CFL: 7.50e-01
i:   1020, sim time: 9.399 days, wall time: 10.170 minutes, Δt: 10.326 minutes, CFL: 7.50e-01
i:   1040, sim time: 9.536 days, wall time: 10.178 minutes, Δt: 10.397 minutes, CFL: 7.50e-01
i:   1060, sim time: 9.667 days, wall time: 10.184 minutes, Δt: 9.339 minutes, CFL: 7.50e-01
i:   1080, sim time: 9.787 days, wall time: 10.191 minutes, Δt: 8.588 minutes, CFL: 7.50e-01
i:   1100, sim time: 9.901 days, wall time: 10.197 minutes, Δt: 8.080 minutes, CFL: 7.50e-01
[ Info: Simulation is stopping after running for 10.203 minutes.
[ Info: Simulation time 10 days equals or exceeds stop time 10 days.
i:   1120, sim time:    10 days, wall time: 10.203 minutes, Δt: 8.007 minutes, CFL: 7.50e-01

[ Info: Saved animation to /rds/user/ab3191/hpc-work/software/OceanBioME.jl/profiling/LOBSTER.mp4

real    16m10.831s
user    15m27.445s
sys     0m11.515s

AdelekeBankole avatar Aug 05 '25 13:08 AdelekeBankole

@AdelekeBankole , two questions:

  1. Can you do/have you done an Nsight run for both, or do you otherwise have a breakdown (maybe using the Julia profiler) of runtimes and potentially memory usage per function?
  2. (maybe @jagoosw @johnryantaylor): I can see that there is an mp4 file written out. I assume that will be a significant part of the runtime (usually I/O is). Maybe we'd like to minimise I/O (is there a switch to avoid writing out the movie file), to concentrate on computational workload?

MarionBWeinzierl avatar Aug 08 '25 10:08 MarionBWeinzierl