faiss
faiss copied to clipboard
Need help picking an index
Summary
I am using Faiss-cpu and have created 4 IndexFlatIP indexes. The dimension size of my vector is 1024. The four IndexFlatIP indexes created correspond to 4 datasets of products with sizes - 1118514, 4118139, 1283033, and 4143605. So, in total my dataset contains roughly ~12 million products. I am trying to find the 100 most similar products (nearest neighbors).
The current runtime of my pipeline for 1 product is ~7 mins. The runtime is extremely high. I am trying to optimize the pipeline so that the runtime is <5 seconds.
I tried using FAISS-GPU but that was giving me an Out-of-Memory error because loading the datasets on the GPU RAM was higher than available RAM Space (32GB).
I was wondering if the community would help me pick an index to achieve a runtime of <5 seconds while not letting accuracy take too big of a hit.
If the potential solution includes using faiss-gpu, that is perfectly fine too.
Platform
OS: Windows 10 Enterprise 19045.3693 (With the option to run faiss-gpu on Ubuntu 20.04 LTS)
Faiss version: 1.7.4
Installed from: Pypi
Running on:
- [x] CPU
- [ ] GPU
Interface:
- [ ] C++
- [x] Python
https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index