quadiron
quadiron copied to clipboard
Use 16bits arithmetic for F_4
Instead of using 32bits arithmetic we can use 16 bits arithmetic.
Here is a simulation: https://github.com/vrancurel/f4mul
Lâm, can you give it a try ?
We can probably further optimize the loop we do to adjust with SIMD operations themselves, e.g. with cmpeq_epu16(), etc
Bitfields operation are in fact unlikely so we can process them only if the bitmap != 0.