Performance issues with scale
Summary
We have a use case where we are needing to store ~3,300,000 (768d) vectors. With a fast search speed between them. Our current approach is dividing these documents into 50 partitions. Where the index configuration is determined from this code snippet.
[[nodiscard]] std::string index_config(uint32_t document_count)
{
int k = 65526;
if (document_count < 2'000'000)
{
k = 32768;
}
if (document_count < 1'000'000)
{
k = 8 * std::sqrt(document_count);
}
if (document_count < 57'600)
{
return "flat";
}
return fmt::format("OPQ56_224,IVF{},PQ56", k);
}
Indexing time is ok. But when performing multiple searches in parallel. We have a setup where each search will spawn 50 threads to search through each partition (This is running on servers with 256 threads). When running up to 10 in parallel. Some searches start to take ~4s and above.
Faiss version: V1.7.2
Installed from: Compiles with project - Source
Faiss compilation options: Using MKL - faiss-avx2 target in cmake
Running on:
- [x] CPU
- [ ] GPU
Interface:
- [x] C++
- [ ] Python
Reproduction instructions
Create 50 partitions with roughly ~65k 768d vectors on each. With each partition, follow the code above for defining the index type.
After training / adding the documents. Launch 50 threads to search through the partitions, and then search 10 at a time. (So in this case, 500 threads would be spawned).
I am wondering what the recommended method for setting something like this up is? The server this is running on does not have a GPU.
If you are running searches in multiple threads, you should disable threading on the Faiss (and MKL) side.
This is done via omp_set_num_threads(1) or the evironment variable OMP_NUM_THREADS=1.
Note that merely calling a library linked with OpenMP incurs a runtime overhead when starting a new thread with pthread_create. This overhead is visible when many short-lived threads are spawned.
To avoid this, compile Faiss without openmp (remove -openmp from the compilation options).
see also https://github.com/facebookresearch/faiss/wiki/Threads-and-asynchronous-calls#performance-of-search
Thanks for the quick reply! I'll implement these and get back to you.