Chris de Vries comments

Results 2 comments of


                                            Chris de Vries

Can also try mixing SIMD Hamming distance on FPU + POPCNT which runs on the ALU to keep all execution units on the CPU busy.

This is likely to be more performant than our naive vector implementations because it uses BLAS.