nfft icon indicating copy to clipboard operation
nfft copied to clipboard

Improve efficiency of NFFT direct transformation in one dimension.

Open jenskeiner opened this issue 6 months ago • 2 comments

Small change to improve the efficiencvy of the direct NFFT trafo in one dimension:

  • Instead of using memset to initialize the target vector with zeros, it's better to just write the final value to the target in the outer loop. Rationale: Using memset will need to access the entire target vector another time. This can be costly if the vector is large and does not fit inside the CPU cache.
  • Instead of accessing the target location f[j] in the inner loop in each iteration, it may be better to accumulate the value in a local variable and write only the final value to f[j]. Rationale: May reduce potentially slow memory access to f[j], but if f[j] is in the CPU cache and/or the compiler is smart, this may not make any difference.
  • Instead of calling the complex expontential cexp to calculate e^{-i*omega}, use real-valued sin and cos functions. Rationale: cexp supports complex-valued arguments, but the actual argument is always purely imaginary.

It was difficult to test this one because there's not simple benchmark I could quickly run. Also, I tested this on arm64/v8 and cycle.h currently doesn't work for me. So I had to set up a scratch file to run a quick check which is not part of this PR. On my platform, the number of cycles for the direct transform drops to 60-80% compared to before.

Would be good if someone could test this separately and on a different architecture (e.g. amd64) as well.

jenskeiner avatar Aug 15 '24 14:08 jenskeiner