Split Elements Into Frontend/Compute?
While adding more and more properties for book-keeping (e.g. name #705) and pre-computing constants (i.e. #850), it might be time to split our element classes generally into:
- a user-facing (CPU-only) class and a
- computing (e.g. GPU-copyable) class
That way, we:
- have a clear place when to pre-compute constants (when creating the compute class and/or updating it with changed parameters)
- would avoid copying extra parameters (name is not needed on device, some user-provided element property constants are fully ignored once we calculate the compute-constants such as R12 for linear elements, etc.)
- might be able to create elements (e.g. in Python) before and after simulation lifetime (AMReX init/finalize are called), e.g., in optimizer workflows, ABEL, etc.
This needs a bit of discussion & experimenting and could also be married with efforts implementing PALS.
Another thing we can do when introducing a wrapper "frontend" for elements is to give them a structure that is like this for various models:
Quad(ds=..., k=..., model="linear") # default
Quad(ds=..., k=..., model="chromatic")
Quad(ds=..., k=..., model="exact")
That way, we would focus the naming of elements on the physics and can select the model as a property.
One could also fully ignore the model at this level, only focus on physics, and then express the model during tracking (for a whole segment). The latter separates concerns a bit more, but could be a bit inflexible in practice (?) and might need to downgrade to the next-best available model often if no higher-one is available.
Another benefit of splitting off the compute part would be that we can compile each element in their own .cpp files. Currently, they all end up in the same TU, which is not very scalable in terms of compile-time.
Another aspect we could keep in mind that we can implement in the backend: we add more and more fallbacks to drifts for elements with zero fields. Currently, we add these fallbacks inside the hot compute loop, which potentially slows down the code / blows up registers.
Using a drift backend for a quad with disabled fields would be more computationally sensible.
In other cases, focusing/defocusing implementations add a similar 2x code split. (This can in some cases also be rewritten using complex math, but keeping the extra code in the end the same, maybe even adding active unnecessary computations instead of a branch.)
A similar result could be achieved by checking and optimizing a lattice before starting tracking, e.g., replacing a disabled ChrQuad with ChrDrift, in a workflow similar to #848