FFHT Out-of-place FHT

Do you have any intention or interest in providing an out-of-place FHT? Many applications, such as Orthogonal JL Transform and random orthogonal embedding kernel methods, would benefit from having distinct input and output locations.

Oct 09 '17 19:10 dnbaker

Can you copy and then run the in-place FHT?

Oct 09 '17 19:10 ilyaraz

Yes, though that requires two passes through the data and with accompanying memory accesses. It's probably not that bad with compiler-optimized memcpy, though it would be nice to avoid if possible. Then again, if your code is cache-optimized, adding another segment of memory to work through would probably ruin it.

Oct 09 '17 19:10 dnbaker

Yes, it's a good point. I might add it, though I don't promise to do it soon. For now, you can just memcpy (or even better, load and store using AVX intrinsics).

Oct 09 '17 19:10 ilyaraz

Thank you!

Does FFHT assume that the memory has been aligned?

Oct 09 '17 20:10 dnbaker

The new version does not assume that!

Oct 09 '17 21:10 ilyaraz

Out of place is very easy. Go through the input data sequentially pair-wise and put the sum sequentially in the lower half of the new array and the difference sequentially in the upper half of the new array. Repeat log base 2 n times.
Two or 3 problems. For even number of steps the data ends up back where it came from. It messes with the CPU L1/L2 caches by using twice as much memory which slows things down a lot. It needs the horizontal add/subtract SSE instructions which are a bit slow at least with legacy SSE which is all I have. The good news it is excellent for getting the intermediate calculations of the WHT, which are wavelet like and can be used as such.

Aug 09 '18 04:08 ghost

FFHT FFHT copied to clipboard

Out-of-place FHT

FFHT
FFHT copied to clipboard