Chris Taylor

Results 33 comments of Chris Taylor

~5x faster for 16-bit field, ~3x faster for 8-bit field hypothetically: >>> affft.test_fft(8) FFT performance for k=8 : 2304 adds, 2304 muladds IFFT performance for k=8 : 2304 adds, 2304...

Thanks for reporting these and sorry I'm slow to respond. Trying to stay focused on another long term project and get it out the door

Yeah for CI that would make more sense

Currently it only detects AVX2/SSSE3. I'll have to add support for a table lookup (super slow) fallback.

Another good option here is to use the Longhair/CRS trick and split up the buffers by bit and do everything with XOR. This can achieve nearly the same performance in...

Yeah using the reference version makes it run like 25x slower

Just pushed some fallbacks for the 8-bit version that are only 5x slower using this table: static ffe_t Multiply8LUT[256 * 256]; I copied some approaches that worked well for GF256...

Relevant benchmarks from GF complete: First one is representative of current approach: Region Best (MB/s): 1635.21 W-Method: 8 -m TABLE -r DOUBLE - This is the XOR-only version that would...

Seems like a simple fix