Trixi.jl icon indicating copy to clipboard operation
Trixi.jl copied to clipboard

proof of concept: optimized method for special combinations of solver/equations

Open ranocha opened this issue 3 years ago • 1 comments

This is basically a proof of concept, demonstrating how to optimize some special cases even further. For example, by optimizing the weak form volume for linear scalar advection, I get a performance improvement of ca. 2x for a complete RHS evaluation. It would be really nice to get the same kind of improvement for hyperbolic diffusion and hence for self-gravity. However, there are some open issues with other repos (see notes in the source code). Even if these were resolved, the currently possible improvements for hyp. diff. are significantly smaller than for linear advection. My first guess is that this is related to our current choice of memory layout (array of structure). I would expect to be able to gain more performance for this case by switching to a structure of arrays (per element).

It's annoying to be the victim of nerd-sniping again... :wink: nerd-sniping

Here are some issues that should be resolved before merging this:

  • [ ] Fix test failures :wink: Currently, it looks like the errors are mainly due to mateuszbaran/HybridArrays.jl#39 and https://github.com/mateuszbaran/HybridArrays.jl/pull/42.
  • [ ] Decide how to approach optimizations like these
    • Use multiple dispatch to redirect specific combinations of solvers/equations to optimized implementations (as done in this draft). That's easy users might want to have more flexibility and it would be easier for developers to track performance and correctness when both our standard and optimized implementations are easily available side by side.
    • Use multiple dispatch to let users opt-in to optimized methods. That seems to be rather flexible and composable, but we will have to decide where such a trait should live and how we want to implement it exactly.
    • Use some global switch to allow optimized methods. I dislike this option since it's not following the approach of Trixi as a library but more in the mindset of Trixi as a monolith.
  • [ ] Handle different memory layout for hyperbolic diffusion
  • [ ] Check precompile statements (and effect of latency)
  • [ ] Check performance of other memory layouts, maybe even full SoA as in (nnodes, nnodes, nelements, nvariables)
  • [ ] Document why we can't use @avx for our general kernels right now
  • [ ] Check performance for different polynomial degrees on different architectures. Can we see the benefits of AVX512?

For the future, it would be really great if we could make use of LoopVectorization in our general kerns while still being able to implement new physics only at the pointwise level as we do right now in Trixi.

ranocha avatar Jan 30 '21 14:01 ranocha

It looks like the errors are mainly due to https://github.com/mateuszbaran/HybridArrays.jl/issues/39.

ranocha avatar Jan 31 '21 07:01 ranocha