FFT Rework
The current state of IPPL's FFT has a few problems.
CPU
For the CPU case, the default Kokkos::View Layout is LayoutRight, which results in this loop being a transposition, resulting in slow runtime.
Heffte's source code states
//! \brief Constructs a box from the low and high indexes, the span in each direction includes the low and high (uses default order).
box3d(std::array<index, 3> clow, std::array<index, 3> chigh) :
low(clow), high(chigh), size(...), order({0, 1, 2})
{}
This corresponds to a Left-Layout (row-major), though this could be changed to have order {2,1,0} for CPU runs.
Interestingly, the FFTW docs on Column-Major format (which is LayoutRight in Kokkos) state that a simple reversal of indices would be enough.
Current performance
Comparing the current IPPL FFT to a simple FFTW yields a factor 42 discrepancy in the time required for a R2C transform.
So we would need to do something like:
if on CPU change order to {2,1,0}
Yes, if heffte doesnt have some other restrictions. It would also be nice if the boxes could be picked to exclude ghost cells, but I dont know if that would be possible
I think this needs to be discussed with Sri (when he is back from India) and maybe Veronica