Cabana
Cabana copied to clipboard
Efficient memory use in FFTs
Simple improvements to memory use in the Cabana FFT implementation could have an outsize impact on performance.
-
Get rid of the data copies. Instead of converting between various types of complex data, we could use
Kokkos::complex<Scalar>
. The issue with this was the alignment inKokkos::complex
, but could be resolved with-DKOKKOS_ENABLE_COMPLEX_ALIGN=ON
-
heFFTe may be allocating a work buffer for each FFT? If so we should pass it a work buffer to use. This could speed things up significantly.
@streeve @sslattery I made an issue here to try capturing potential improvements for FFT performance.
With #451 merged can you start on these @sfogerty? Ideally for each performance update you can run the test on each backend and show the improvement. I have a small python script to compare if that's useful
@sfogerty is it relatively straightforward to add the other optimization here?