faiss icon indicating copy to clipboard operation
faiss copied to clipboard

Upgrade IVFFlatScanner and exhaustive_L2sqr_seq

Open alexanderguzhva opened this issue 2 years ago • 14 comments

Speeds up IVFFlatScanner by helping the branch predictor and computing distances for 4 elements per loop. Same upgrade for exhaustive_inner_product_seq() and exhaustive_L2sqr_seq() calls.

alexanderguzhva avatar Sep 25 '23 19:09 alexanderguzhva

@mdouze Matthijs, could you please check whether this test failure is critical. It seems that it is about epsilons on certain architectures. On the other hand, it is seems that it is the code for dedup-ing.

alexanderguzhva avatar Sep 26 '23 22:09 alexanderguzhva

@mdouze , I've upgraded the code, please take a look, whether it is simple enough or whether it needs to simplified further

alexanderguzhva avatar Oct 02 '23 01:10 alexanderguzhva

OK the current version is complex but less repetitive. I'll import to fbcode.

Weird that the errors still appear. I thought I relaxed these tests. Did you rebase on latest Faiss?

mdouze avatar Oct 02 '23 20:10 mdouze

@mdouze has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Oct 02 '23 20:10 facebook-github-bot

Do you have numbers (even very rough) that show where this modification speeds up search?

mdouze avatar Oct 03 '23 07:10 mdouze

@mdouze , my mistake, let me rebase and run the test

alexanderguzhva avatar Oct 03 '23 19:10 alexanderguzhva

I've used benchs/bench_ivf_selector.cpp, fixed its errors (missing #include <faiss/impl/FaissAssert.h>) and made sure that cmake version compiles the code for AVX2, and used IVF1024,Flat index (instead of a default IVF1024,SQ8).

Two cases: d=64 and d=768. I'm not sure whether the results are super reliable for d=64 (I'm running one on a VM), but it seems that it became a bit slower, while for d=768 the speedup is visible and reproducible.

I think that I either need to add some threshold for d (say, starting from d=512, otherwise the new version is plain slower), or make it optional, or check the code further. So, please don't land this diff yet.

d=64, Baseline:

index_key=IVF1024,Flat
Read /tmp/bench_ivf_selector_IVF1024,Flat.faissindex
parallel_mode=0
search time, no selector: 89.960 ms
search time with nullptr selector: 247.900 ms
search time with selector: 244.995 ms
search time with null selector + manual parallel: 246.943 ms
parallel_mode=3
search time, no selector: 85.567 ms
search time with nullptr selector: 85.795 ms
search time with selector: 85.635 ms
search time with null selector + manual parallel: 87.590 ms
set single thread
parallel_mode=0
search time, no selector: 259.159 ms
search time with nullptr selector: 226.071 ms
search time with selector: 221.616 ms
search time with null selector + manual parallel: 221.348 ms

d=64, Candidate:

index_key=IVF1024,Flat
Read /tmp/bench_ivf_selector_IVF1024,Flat.faissindex
parallel_mode=0
search time, no selector: 93.027 ms
search time with nullptr selector: 252.398 ms
search time with selector: 247.649 ms
search time with null selector + manual parallel: 248.685 ms
parallel_mode=3
search time, no selector: 88.478 ms
search time with nullptr selector: 87.638 ms
search time with selector: 89.603 ms
search time with null selector + manual parallel: 91.405 ms
set single thread
parallel_mode=0
search time, no selector: 254.079 ms
search time with nullptr selector: 218.183 ms
search time with selector: 229.076 ms
search time with null selector + manual parallel: 217.755 ms

d=768, Baseline:

index_key=IVF1024,Flat
Read /tmp/0bench_ivf_selector_IVF1024,Flat.faissindex
parallel_mode=0
search time, no selector: 938.073 ms
search time with nullptr selector: 2196.695 ms
search time with selector: 2214.705 ms
search time with null selector + manual parallel: 2249.250 ms
parallel_mode=3
search time, no selector: 928.345 ms
search time with nullptr selector: 930.376 ms
search time with selector: 937.659 ms
search time with null selector + manual parallel: 934.082 ms
set single thread
parallel_mode=0
search time, no selector: 2281.896 ms
search time with nullptr selector: 2211.583 ms
search time with selector: 2216.971 ms
search time with null selector + manual parallel: 2221.527 ms

d=768, Candidate:

index_key=IVF1024,Flat
Read /tmp/0bench_ivf_selector_IVF1024,Flat.faissindex
parallel_mode=0
search time, no selector: 893.283 ms
search time with nullptr selector: 1847.295 ms
search time with selector: 1873.944 ms
search time with null selector + manual parallel: 1932.494 ms
parallel_mode=3
search time, no selector: 902.230 ms
search time with nullptr selector: 889.027 ms
search time with selector: 891.625 ms
search time with null selector + manual parallel: 901.839 ms
set single thread
parallel_mode=0
search time, no selector: 1885.094 ms
search time with nullptr selector: 1861.240 ms
search time with selector: 1846.442 ms
search time with null selector + manual parallel: 1849.544 ms

alexanderguzhva avatar Oct 03 '23 21:10 alexanderguzhva

@mdouze , additionally, I will modify fvec_L2sqr_by_idx() and fvec_inner_products_by_idx() in the same way, because it helps IndexRefineFlat

alexanderguzhva avatar Oct 03 '23 21:10 alexanderguzhva

I'm still playing with it

alexanderguzhva avatar Oct 09 '23 12:10 alexanderguzhva

What is the current status of this PR?

mdouze avatar Nov 28 '23 10:11 mdouze

@mdouze Technically, it works and I can rebase it on the master branch. But this diff improves the performance for large dim datasets, but may degrade the performance for smaller dim datasets. I've enabled it in Zilliz version of faiss, because it mostly operates with dim like 512+. But I'm not sure for the general purpose case.

Is the benchmarking set that Gergely works on ready to be tried?

alexanderguzhva avatar Nov 28 '23 17:11 alexanderguzhva

hi @alexanderguzhva , there are some conflicts which would you like to resolve them first?

junjieqi avatar May 23 '24 16:05 junjieqi

@junjieqi I'll need to reiterate this diff, because it improves the performance in certain cases and hurts in others.

alexanderguzhva avatar May 24 '24 02:05 alexanderguzhva

Let's consider this diff is a draft for now.

mdouze avatar May 28 '24 16:05 mdouze