CarpetX: Added interpolation caches: Stencils and Particles
Overview
This PR provides two optimizations for CarpetX's interpolator, based on caching:
- Compute interpolation stencil positions and offsets for all particles. This information can be reused across multiple variables.
- Caches AMReX particle containers used in interpolation. When interpolation targets don't change (i.e., when the grid does not change), these containers can be reused. This skips the expensive operations of creating and distributing AMReX particles across domain.
I expect this optimization to be important for multipatch runs that don't change their grids very often (or at all)
Benchmarks
These are Cactus timer based benchmarks that measure Multipatch1_Interpolate, which indirectly calls CarpetX_Interpolate.
The benchmark used runs for 4 BSSN iterations with 7 levels of mesh refinement and no regrinding. It uses 8 MPI processes and 5 OpenMP threads per process.
Cactus timer results for benchmark before optimization.
======================================================================
% Time/s Min/s Max/s Timer (gettimeofday)
======================================================================
100.0 170.931 170.930 170.931 CCTK total time
. . .
48.6 83.122 82.376 84.132 Sync
. . .
6.3 10.854 10.233 11.643 CapyrX::MultiPatch1_Interpolate
Cactus timer results for benchmark after optimization
======================================================================
% Time/s Min/s Max/s Timer (gettimeofday)
======================================================================
100.0 164.690 164.690 164.690 CCTK total time
. . .
46.6 76.758 76.050 77.963 Sync
. . .
4.5 5.685 5.068 6.478 CapyrX::MultiPatch1_Interpolate
Speedups using avg. time (old / new):
Total Runtime: 3.8%
Sync : 8.3%
Interpolation: 90.9%
Ticket is here
I like this idea. We did the same in Carpet.
Instead of using an automatic cache I recommend splitting the interpolation function into two. The first only sets up things, without interpolating anything, and returns an "interpolation setup". The second function would take an interpolation setup as input and perform the interpolation. In this way there is no automatic mechanism and no checking needed.
In Carpet we also have a "world age" which increases whenever the grid setup changes. The world age is stored in the interpolation setup. When the world age does not match then the interpolation setup is recreated automatically.
I'm failing to see how an InterpolationSetup data structure that holds things that the interpolator needs and has an age parameter that automatically recreates the setup when needed is any different than an automatic cache that holds the AMReX particles and recreates/redistributes them when necessary? It seems to me that these are achieving the same thing in a conceptual level, only with different names?
Once I understand the difference I can begin working on that. It will probably be significantly different than the current status of this PR though (I think I will have to basically start from scratch), so maybe change it to a draft?
Maybe see: https://bitbucket.org/einsteintoolkit/tickets/issues/2898/carpetx-interpolator-caches?iframe=true&spa=0#comment-69249901
The main difference is that an interpolation setup can be used right away by the interpolator, it does not need to search the cache, and does not need to ensure that the cache doesn't overflow.
Ok, so I hadn't seen Roland's bitbucket comment, sorry about that. I need to look at CarpetInterp2 more thoroughly to try and understand what it is doing exactly.
But I think it would be useful (for my understanding) if I could get a higher level understanding for what you would like to have so I can plan how the architecture of this would work.
- CarpetX has a global variable called
epoch. It starts at 0. It automatically tracks which epoch it is in by increasing theepochvalue whenever it has to (grid changes or something else). - There are global data structures (say
particle_cacheandinterp_stencil_cache) that hold interpolation data. These structures hold the properly re-distributed AMReX particle data for interpolation targets and interpolation stencil. They also hold which epoch this interpolation data is valid for. One could, prob. make them a single structure (sayinterp_data) - There's a new function (say,
interp_setup) that accessesinterp_dataand reads its stored epoch. It then compares it against CarpetX's current epoch. If they differ this function recreates the necessary data and updates the global data store. CarpetX_Interpolatewould now callinterp_setup, access the stored interpolation data and perform the interpolation using that. No changes to theCarpetX_InterpolateAPI would then be necessary
If the above is correct, then I think I would know how to get started. But I would also say that this is still a cache and is still automatic, in some sense, even though I understand why it is different than what I did and how simpler the epoch way would potentially be than what I came up with originally
Converted to draft while I rework the implementation
Based on feedback, I've significantly changed this PR. I think the caching change is complicated and we ought to do it in phases.
In this PR, I'm implementing the simplest change: I've split CarpetX_Interpolate into two:
- Relevant interpolation data is gathered and saved in
InterpolationSetup. Currently, I'm saving only theAMReXparticles, but saving more stuff in the future (such as the interpolation stencils themselves) should be easy to do InterpolateFromSetupis called with an interpolation setup and result pointers and the interpolation is performed.
Currently, I've changed CarpetX_Interpolate to use this mechanism internally. This makes sure that no external user of the CarpetX interpolation routines will note any changes unless they opt in to use the new API.