bundled FFTw
I understand FFTw is necessary to be bundled for a WASM build (is that correct?), but on any other system, we must use the system FFTw, to be packagable. (Also, nobody wants to be the maintainer for a vendored FFTw)
No, it is not. Also, it is not the default implementation that is being used.
Unless there are any other objections, I'd remove the FFTw dependency entirely for the <simd>-FFT version.
The latter version is more portable, much faster (factor 2-5), and has mainstream support from GCC (current SIMD reference implementation) and largely LLVM (via the 'vir-simd' wrapper) as well. Since its pure C++, it has also a better chance of getting integrated w.r.t. GPU acceleration (via SYCL). For reference (CPU only, see also here):
I might be misinterpreting this output, but isn't it slower in all non-power-of-2 settings?
I might be misinterpreting this output, but isn't it slower in all non-power-of-2 settings?
The numbers are throughput (normalised to 'N log(N)')-- higher in the last column is better:
hey, I might still be misunderstanding things; I'm just comparing 1009 length FFT impls (since these are the only non-power-of-2 ones, and as I'm travelling I can't build the benchmark/test suite on the side right now, laptop has insufficient RAM to work with it simultaneously, but out of practical needs, I'd like to compare length-1536 (DAB+) and length-52 (various IEEE802.11 substandards) and a 20000-point FFT (low number of different, small, prime factors, relevant for specific analysis I need to do)).
I'd assume as prime number 1009 is a good if unlikely test case for ugly-as-it-gets not-power-of-2 FFT lengths!
@marcusmueller the non-power of two is realised via the Bluestein's algorithm. Not yet implemented (WIP) for the <simd>-FFT version. Stay tuned.