fbarchard
fbarchard
The bfly functions do multiple loads and stores of Fout e.g. C_ADDTO(*Fout,scratch[3]); Code size and performance is improved by loading Fout at the top of the loop and storing it...
Neon
kissfft on ARMv7 with fixed point is slow. A neon version would improve performance quite a bit. 1. there arent enough registers, so the stack is used 2. shifts and...
Could you add a prefetch instruction for WASM? The syntax would be similar to a load The load instruction uses a local.get for the base pointer, and an immediate offset:,...
The current fixed point DIVSCALAR multiplies by SAMP_MAX which is off by 1 ``` # define DIVSCALAR(x,k) \ (x) = sround( smul( x, SAMP_MAX/k ) ) ``` which produces 0.999938965...
An average function written with intrinsics produces inaccurate values when optimized. It works with `-O0` but fails with all levels of optimization `-Os`, `-Oz`, `-O1`, `-O2` It also works when...
string size delimiter expects int, so cast parameter from long to int. initializer for struct expects 2 values, so pass 0 for pointer and 0LL for long long.
Fix for DIVSCALAR off by 1 (Fixes #83)
Fix cpuinfo_x86_normalize_brand_string unannotated fall-through warning
Disable I8MM on clang earlier than 11 Fixes #6246
F32 and F16 are missing vbinary benchmarks There are vunary benchmarks which are generated from tests And a few have 8 bit tests qs8-vadd.cc:#include qs8-vaddc.cc:#include qs8-vmul.cc:#include qs8-vmulc.cc:#include qu8-vadd.cc:#include qu8-vaddc.cc:#include qu8-vmul.cc:#include...