FFHT
FFHT copied to clipboard
Out-of-place FHT
Do you have any intention or interest in providing an out-of-place FHT? Many applications, such as Orthogonal JL Transform and random orthogonal embedding kernel methods, would benefit from having distinct input and output locations.
Can you copy and then run the in-place FHT?
Yes, though that requires two passes through the data and with accompanying memory accesses. It's probably not that bad with compiler-optimized memcpy, though it would be nice to avoid if possible. Then again, if your code is cache-optimized, adding another segment of memory to work through would probably ruin it.
Yes, it's a good point. I might add it, though I don't promise to do it soon. For now, you can just memcpy
(or even better, load
and store
using AVX intrinsics).
Thank you!
Does FFHT assume that the memory has been aligned?
The new version does not assume that!
Out of place is very easy. Go through the input data sequentially pair-wise and put the sum sequentially in the lower half of the new array and the difference sequentially in the upper half of the new array. Repeat log base 2 n times.
Two or 3 problems. For even number of steps the data ends up back where it came from.
It messes with the CPU L1/L2 caches by using twice as much memory which slows things down a lot. It needs the horizontal add/subtract SSE instructions which are a bit slow at least with legacy SSE which is all I have.
The good news it is excellent for getting the intermediate calculations of the WHT, which are wavelet like and can be used as such.