graph-prototype icon indicating copy to clipboard operation
graph-prototype copied to clipboard

bundled FFTw

Open marcusmueller opened this issue 4 months ago • 5 comments

I understand FFTw is necessary to be bundled for a WASM build (is that correct?), but on any other system, we must use the system FFTw, to be packagable. (Also, nobody wants to be the maintainer for a vendored FFTw)

marcusmueller avatar Sep 13 '25 19:09 marcusmueller

No, it is not. Also, it is not the default implementation that is being used.

Unless there are any other objections, I'd remove the FFTw dependency entirely for the <simd>-FFT version.

The latter version is more portable, much faster (factor 2-5), and has mainstream support from GCC (current SIMD reference implementation) and largely LLVM (via the 'vir-simd' wrapper) as well. Since its pure C++, it has also a better chance of getting integrated w.r.t. GPU acceleration (via SYCL). For reference (CPU only, see also here):

Image

RalphSteinhagen avatar Sep 15 '25 05:09 RalphSteinhagen

I might be misinterpreting this output, but isn't it slower in all non-power-of-2 settings?

marcusmueller avatar Sep 15 '25 22:09 marcusmueller

I might be misinterpreting this output, but isn't it slower in all non-power-of-2 settings?

The numbers are throughput (normalised to 'N log(N)')-- higher in the last column is better:

Image

RalphSteinhagen avatar Sep 16 '25 05:09 RalphSteinhagen

hey, I might still be misunderstanding things; I'm just comparing 1009 length FFT impls (since these are the only non-power-of-2 ones, and as I'm travelling I can't build the benchmark/test suite on the side right now, laptop has insufficient RAM to work with it simultaneously, but out of practical needs, I'd like to compare length-1536 (DAB+) and length-52 (various IEEE802.11 substandards) and a 20000-point FFT (low number of different, small, prime factors, relevant for specific analysis I need to do)).

I'd assume as prime number 1009 is a good if unlikely test case for ugly-as-it-gets not-power-of-2 FFT lengths!

marcusmueller avatar Sep 16 '25 06:09 marcusmueller

@marcusmueller the non-power of two is realised via the Bluestein's algorithm. Not yet implemented (WIP) for the <simd>-FFT version. Stay tuned.

RalphSteinhagen avatar Sep 16 '25 08:09 RalphSteinhagen