Chris de Vries

Results 2 comments of Chris de Vries

Can also try mixing SIMD Hamming distance on FPU + POPCNT which runs on the ALU to keep all execution units on the CPU busy.

This is likely to be more performant than our naive vector implementations because it uses BLAS.