faiss icon indicating copy to clipboard operation
faiss copied to clipboard

HNSW Flat Poor performance while Performance Testing

Open IcanDoItL opened this issue 1 year ago • 7 comments

I use IndexHNSWFlat make Index.

hnsw_index = faiss.IndexHNSWFlat(128, 32, faiss.METRIC_INNER_PRODUCT)
hnsw_index.hnsw.efConstruction = 64
hnsw_index = faiss.read_index('faiss.index', faiss.IO_FLAG_MMAP)
hnsw_index.hnsw.efSearch = 256
hnsw_index .search(query_embedding, 10)

when I use jmeter Stress Testing performance poor , has many context switch and system call image system call statistics : strace -c -p pid image

Summary

Platform

OS: Linux

Faiss version: faiss-cpu = 1.8.0

Installed from: anaconda

Faiss compilation options:

Running on:

  • [*] CPU
  • [ ] GPU

Interface:

  • [ ] C++
  • [*] Python

Reproduction instructions

IcanDoItL avatar Aug 20 '24 03:08 IcanDoItL

https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors this is the multiprocessing code, call hnsw method,while print 'search' , see vmstat 1 1000. but call ivf method,It didn't appear

import faiss
import concurrent.futures

try:
    from faiss.contrib.datasets_fb import DatasetSIFT1M
except ImportError:
    from faiss.contrib.datasets import DatasetSIFT1M

k = 10
print("load data")
ds = DatasetSIFT1M()
xq = ds.get_queries()
xb = ds.get_database()
xt = ds.get_train()
nq, d = xq.shape


def faiss_search(index, xq):
    index.search(xq, k)


def ivf():
    print("Testing IVF Flat (baseline)")
    quantizer = faiss.IndexFlatL2(d)
    index = faiss.IndexIVFFlat(quantizer, d, 16384)
    index.cp.min_points_per_centroid = 5  # quiet warning

    # to see progress
    index.verbose = True

    print("training")
    index.train(xt)

    print("add")
    index.add(xb)

    print("search")
    index.nprobe = 256
    with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
        for i in range(3):
            for q in xq:
                executor.submit(faiss_search, index, q.reshape(1, -1))


def hnsw():
    print("Testing HNSW Flat")
    index = faiss.IndexHNSWFlat(d, 32)
    index.hnsw.efConstruction = 40

    print("add")
    index.verbose = True
    index.add(xb)

    index.hnsw.search_bounded_queue = True
    index.hnsw.efSearch = 256
    print("search")

    with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
        for i in range(3):
            for q in xq:
                executor.submit(faiss_search, index, q.reshape(1, -1))


if __name__ == '__main__':
    ivf()
    hnsw()

image

IcanDoItL avatar Aug 20 '24 11:08 IcanDoItL

@IcanDoItL faiss.IO_FLAG_MMAP

alexanderguzhva avatar Aug 20 '24 13:08 alexanderguzhva

@alexanderguzhva I tried, it doesn't work.

hnsw_index = faiss.read_index('faiss.index', faiss.IO_FLAG_MMAP)
hnsw_index.hnsw.efSearch = 256
hnsw_index .search(query_embedding, 10)

IcanDoItL avatar Aug 21 '24 01:08 IcanDoItL

the question like https://github.com/erikbern/ann-benchmarks/issues/47

IcanDoItL avatar Aug 21 '24 09:08 IcanDoItL

Normally one would parallelize the search using Faiss' internal threading. It is unavoidable that the python threading incurs an overhead. It may be because, irrespective of the number of queries, OpenMP spawns new threads which interact badly with the Python threads. So the code here:

https://github.com/facebookresearch/faiss/blob/main/faiss/IndexHNSW.cpp#L266

should be parallelized only if the number of queries is > 1 (and presumbaly something larger than 1)

mdouze avatar Aug 28 '24 10:08 mdouze

For IVF search, this check is performed here: https://github.com/facebookresearch/faiss/blob/main/faiss/IndexIVF.cpp#L447

mdouze avatar Aug 28 '24 10:08 mdouze

Thanks, I see https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors. so I try use hnsw index instead of ivf index, but when I use hnsw index and jmeter Stress(Simulate users accessing simultaneously) Testing performance poor.

I’m planning to use it in a production environment. each search the number of queries is 1, but there will be multiple users accessing simultaneously. Could you offer some advice?

IcanDoItL avatar Aug 28 '24 11:08 IcanDoItL