faiss icon indicating copy to clipboard operation
faiss copied to clipboard

SIMD performance comparison

Open chasingegg opened this issue 3 years ago • 4 comments

Hi, it is really useful to see how SIMD works at 4-bit PQ by simulate_kernels_PQ4, but I'm wondering why the first attempt is worse than the scann implementation since it seems that the first attempt has fewer loops. And furthermore, do we have a real performance comparison instead of this simulation?

chasingegg avatar Nov 10 '22 12:11 chasingegg

I did the comparison (in C of course) and it was much slower than the code layout used in scann.

mdouze avatar Nov 11 '22 11:11 mdouze

Another question... why we have such code layout? 0, 8, 1, 9.... will the sequence affect the efficiency? image Why don't we set 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 in the first 4bit of each byte in one 128 bit register.

chasingegg avatar Dec 03 '22 15:12 chasingegg

@mdouze Do you know the reason...?

chasingegg avatar Dec 15 '22 06:12 chasingegg

I also got the same result that it is more slower with simd

ryankeke avatar Aug 04 '23 09:08 ryankeke