faiss icon indicating copy to clipboard operation
faiss copied to clipboard

new parallel mode for IVFFLAT

Open phosphorylation opened this issue 2 years ago • 1 comments

Summary

This is a petition for a pull request. There are some very good blas distance functions in FLAT, but they aren't used at all in IVFFLAT. This is because all current parallel mode of IVFFLAT process queries one by one. To boost performance and utilize these blas implementations I propose a new pmode=4 route for IVFFLAT index: we parallelize over each list, group together all the queries that need to scan the current list, and batch scan the current list using all these affiliated queries. This way, we might get nq for each list to be larger than distance_compute_blas_threshold so we might use knn_{L2sqr/inner_product}_blas. I already implemented this and did a crude benchmark on SIFT1M:

running on Intel(R) Xeon(R) CPU E5-2683 v4. SIFT1M nlist=1024 k=50 single thread case

pmode=0 pmode=4
nprobe nq
16 500 1.290s 0.776s
16 1000 2.579s 0.729s
16 10000 25.667s 3.896s
32 500 2.432s 0.767s
32 1000 5.062s 0.864s
32 10000 49.453s 7.454s

omp_num_threads=8 case

pmode=0 pmode=4
nprobe nq
16 500 0.157s 0.096s
16 1000 0.303s 0.101s
16 10000 2.805s 0.491s
32 500 0.289s 0.095s
32 1000 0.576s 0.122s
32 10000 5.495s 0.853s

Overall, this pmode performes better as your (nprobe*nq/nlist) increase. as I tested it on SIFT10M, it could provide 10X performance compares to pmode=0 under certain conditions. If you guys are interested I'll make a pull request.

Platform

OS: Ubuntu-16.04.1

Faiss version: 1.7.1

Installed from: compiled myself

Faiss compilation options: -DBLA_VENDOR=Intel10_64_dyn -DCMAKE_BUILD_TYPE=Release -DFAISS_OPT_LEVEL=avx2

Running on:

  • [ x] CPU

Interface:

  • [x ] C++

Reproduction instructions

phosphorylation avatar Apr 25 '22 03:04 phosphorylation