faiss
faiss copied to clipboard
new parallel mode for IVFFLAT
Summary
This is a petition for a pull request. There are some very good blas distance functions in FLAT, but they aren't used at all in IVFFLAT. This is because all current parallel mode of IVFFLAT process queries one by one. To boost performance and utilize these blas implementations I propose a new pmode=4 route for IVFFLAT index: we parallelize over each list, group together all the queries that need to scan the current list, and batch scan the current list using all these affiliated queries. This way, we might get nq for each list to be larger than distance_compute_blas_threshold so we might use knn_{L2sqr/inner_product}_blas. I already implemented this and did a crude benchmark on SIFT1M:
running on Intel(R) Xeon(R) CPU E5-2683 v4. SIFT1M nlist=1024 k=50 single thread case
pmode=0 | pmode=4 | ||
---|---|---|---|
nprobe | nq | ||
16 | 500 | 1.290s | 0.776s |
16 | 1000 | 2.579s | 0.729s |
16 | 10000 | 25.667s | 3.896s |
32 | 500 | 2.432s | 0.767s |
32 | 1000 | 5.062s | 0.864s |
32 | 10000 | 49.453s | 7.454s |
omp_num_threads=8 case
pmode=0 | pmode=4 | ||
---|---|---|---|
nprobe | nq | ||
16 | 500 | 0.157s | 0.096s |
16 | 1000 | 0.303s | 0.101s |
16 | 10000 | 2.805s | 0.491s |
32 | 500 | 0.289s | 0.095s |
32 | 1000 | 0.576s | 0.122s |
32 | 10000 | 5.495s | 0.853s |
Overall, this pmode performes better as your (nprobe*nq/nlist) increase. as I tested it on SIFT10M, it could provide 10X performance compares to pmode=0 under certain conditions. If you guys are interested I'll make a pull request.
Platform
OS: Ubuntu-16.04.1
Faiss version: 1.7.1
Installed from: compiled myself
Faiss compilation options: -DBLA_VENDOR=Intel10_64_dyn -DCMAKE_BUILD_TYPE=Release -DFAISS_OPT_LEVEL=avx2
Running on:
- [ x] CPU
Interface:
- [x ] C++