faiss icon indicating copy to clipboard operation
faiss copied to clipboard

Performance issues with scale

Open caiowakamatsuahrefs opened this issue 3 years ago • 3 comments

Summary

We have a use case where we are needing to store ~3,300,000 (768d) vectors. With a fast search speed between them. Our current approach is dividing these documents into 50 partitions. Where the index configuration is determined from this code snippet.

[[nodiscard]] std::string index_config(uint32_t document_count)
{
  int k = 65526;

  if (document_count < 2'000'000)
  {
    k = 32768;
  }

  if (document_count < 1'000'000)
  {
    k = 8 * std::sqrt(document_count);
  }

  if (document_count < 57'600)
  {
    return "flat";
  }

  return fmt::format("OPQ56_224,IVF{},PQ56", k);
}

Indexing time is ok. But when performing multiple searches in parallel. We have a setup where each search will spawn 50 threads to search through each partition (This is running on servers with 256 threads). When running up to 10 in parallel. Some searches start to take ~4s and above.

Faiss version: V1.7.2

Installed from: Compiles with project - Source

Faiss compilation options: Using MKL - faiss-avx2 target in cmake

Running on:

  • [x] CPU
  • [ ] GPU

Interface:

  • [x] C++
  • [ ] Python

Reproduction instructions

Create 50 partitions with roughly ~65k 768d vectors on each. With each partition, follow the code above for defining the index type.

After training / adding the documents. Launch 50 threads to search through the partitions, and then search 10 at a time. (So in this case, 500 threads would be spawned).

I am wondering what the recommended method for setting something like this up is? The server this is running on does not have a GPU.

caiowakamatsuahrefs avatar Sep 30 '22 04:09 caiowakamatsuahrefs

If you are running searches in multiple threads, you should disable threading on the Faiss (and MKL) side. This is done via omp_set_num_threads(1) or the evironment variable OMP_NUM_THREADS=1. Note that merely calling a library linked with OpenMP incurs a runtime overhead when starting a new thread with pthread_create. This overhead is visible when many short-lived threads are spawned. To avoid this, compile Faiss without openmp (remove -openmp from the compilation options).

mdouze avatar Sep 30 '22 07:09 mdouze

see also https://github.com/facebookresearch/faiss/wiki/Threads-and-asynchronous-calls#performance-of-search

mdouze avatar Sep 30 '22 07:09 mdouze

Thanks for the quick reply! I'll implement these and get back to you.

caiowakamatsuahrefs avatar Sep 30 '22 09:09 caiowakamatsuahrefs