2decomp-fft
2decomp-fft copied to clipboard
ability to perform runtime autotuning
ability to perform runtime autotuning of the process grid dimensions used to partition the global domain and communication backends used for transpose and/or halo communication. This feature enables users to run the library using the best performing configuration for a given global domain size, number of tasks, and compute cluster topology. The autotuner aims to select decomposition and communication backend options that minimize transpose and halo communication time
See https://nvidia.github.io/cuDecomp/autotuning.html