faiss icon indicating copy to clipboard operation
faiss copied to clipboard

Need help picking an index

Open gajghatenv opened this issue 1 year ago • 1 comments

Summary

I am using Faiss-cpu and have created 4 IndexFlatIP indexes. The dimension size of my vector is 1024. The four IndexFlatIP indexes created correspond to 4 datasets of products with sizes - 1118514, 4118139, 1283033, and 4143605. So, in total my dataset contains roughly ~12 million products. I am trying to find the 100 most similar products (nearest neighbors).

The current runtime of my pipeline for 1 product is ~7 mins. The runtime is extremely high. I am trying to optimize the pipeline so that the runtime is <5 seconds.

I tried using FAISS-GPU but that was giving me an Out-of-Memory error because loading the datasets on the GPU RAM was higher than available RAM Space (32GB).

I was wondering if the community would help me pick an index to achieve a runtime of <5 seconds while not letting accuracy take too big of a hit.

If the potential solution includes using faiss-gpu, that is perfectly fine too.

Platform

OS: Windows 10 Enterprise 19045.3693 (With the option to run faiss-gpu on Ubuntu 20.04 LTS)

Faiss version: 1.7.4

Installed from: Pypi

Running on:

  • [x] CPU
  • [ ] GPU

Interface:

  • [ ] C++
  • [x] Python

gajghatenv avatar Jan 15 '24 17:01 gajghatenv

https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index

algoriddle avatar Jan 18 '24 15:01 algoriddle