Chris Taylor comments

Results 33 comments of


Chris Taylor

Investigate Frobenius Additive FFT

~5x faster for 16-bit field, ~3x faster for 8-bit field hypothetically: >>> affft.test_fft(8) FFT performance for k=8 : 2304 adds, 2304 muladds IFFT performance for k=8 : 2304 adds, 2304...

Investigate Frobenius Additive FFT

Maybe one day :)

Warnings from newer GCC versions

Thanks for reporting these and sorry I'm slow to respond. Trying to stay focused on another long term project and get it out the door

Test(s) which terminate

Yeah for CI that would make more sense

Request: support of generic CPUs (w/o SSSE3)

Currently it only detects AVX2/SSSE3. I'll have to add support for a table lookup (super slow) fallback.

Request: support of generic CPUs (w/o SSSE3)

Another good option here is to use the Longhair/CRS trick and split up the buffers by bit and do everything with XOR. This can achieve nearly the same performance in...

Request: support of generic CPUs (w/o SSSE3)

Yeah using the reference version makes it run like 25x slower

Request: support of generic CPUs (w/o SSSE3)

Just pushed some fallbacks for the 8-bit version that are only 5x slower using this table: static ffe_t Multiply8LUT[256 * 256]; I copied some approaches that worked well for GF256...

Request: support of generic CPUs (w/o SSSE3)

Relevant benchmarks from GF complete: First one is representative of current approach: Region Best (MB/s): 1635.21 W-Method: 8 -m TABLE -r DOUBLE - This is the XOR-only version that would...

GCC ≤ 4.6 build fails

Seems like a simple fix