nfft Improve efficiency of NFFT direct transformation in one dimension.

Improve efficiency of NFFT direct transformation in one dimension.

Open jenskeiner opened this issue 6 months ago • 2 comments

Small change to improve the efficiencvy of the direct NFFT trafo in one dimension:

Instead of using memset to initialize the target vector with zeros, it's better to just write the final value to the target in the outer loop. Rationale: Using memset will need to access the entire target vector another time. This can be costly if the vector is large and does not fit inside the CPU cache.
Instead of accessing the target location f[j] in the inner loop in each iteration, it may be better to accumulate the value in a local variable and write only the final value to f[j]. Rationale: May reduce potentially slow memory access to f[j], but if f[j] is in the CPU cache and/or the compiler is smart, this may not make any difference.
Instead of calling the complex expontential cexp to calculate e^{-i*omega}, use real-valued sin and cos functions. Rationale: cexp supports complex-valued arguments, but the actual argument is always purely imaginary.

It was difficult to test this one because there's not simple benchmark I could quickly run. Also, I tested this on arm64/v8 and cycle.h currently doesn't work for me. So I had to set up a scratch file to run a quick check which is not part of this PR. On my platform, the number of cycles for the direct transform drops to 60-80% compared to before.

Would be good if someone could test this separately and on a different architecture (e.g. amd64) as well.

Aug 15 '24 14:08 jenskeiner

nfft nfft copied to clipboard

Improve efficiency of NFFT direct transformation in one dimension.

nfft
nfft copied to clipboard