nfft
nfft copied to clipboard
Improve efficiency of NFFT direct transformation in one dimension.
Small change to improve the efficiencvy of the direct NFFT trafo in one dimension:
- Instead of using
memset
to initialize the target vector with zeros, it's better to just write the final value to the target in the outer loop. Rationale: Using memset will need to access the entire target vector another time. This can be costly if the vector is large and does not fit inside the CPU cache. - Instead of accessing the target location
f[j]
in the inner loop in each iteration, it may be better to accumulate the value in a local variable and write only the final value tof[j]
. Rationale: May reduce potentially slow memory access tof[j]
, but iff[j]
is in the CPU cache and/or the compiler is smart, this may not make any difference. - Instead of calling the complex expontential
cexp
to calculate e^{-i*omega}, use real-valuedsin
andcos
functions. Rationale:cexp
supports complex-valued arguments, but the actual argument is always purely imaginary.
It was difficult to test this one because there's not simple benchmark I could quickly run. Also, I tested this on arm64/v8 and cycle.h
currently doesn't work for me. So I had to set up a scratch file to run a quick check which is not part of this PR. On my platform, the number of cycles for the direct transform drops to 60-80% compared to before.
Would be good if someone could test this separately and on a different architecture (e.g. amd64) as well.