faiss
faiss copied to clipboard
Clarification Needed: Differences Between Faiss IndexFlatIP Search Results and Cosine Similarity
Summary
Platform
OS: Ubuntu20
Faiss version: lastest
Installed from: sourec build
Faiss compilation options:
Running on:
- [v] CPU
- [v] GPU
Interface:
- [ ] C++
- [v] Python
Reproduction instructions
I am reaching out with a query regarding some inconsistencies I've encountered while using Faiss for indexing and search operations. My primary concern revolves around the discrepancy between results obtained from a search function on an IndexFlatIP index and those calculated using cosine similarity (both scipy and scikit-learn) on the same dataset. Values from faiss search have a higher matching rate
Specifically, there are occasional mismatches between the distance values (D) from cosine_similarity(np.expand_dims(face_embedding, axis=0), index_np) and those obtained from index.search. While these discrepancies aren't constant, they are noticeable in certain instances.
I am curious about how Faiss handles distance calculations and whether there is any additional preprocessing applied to feature vectors post L2-normalization within Faiss. Any clarification or additional information on this matter would be immensely helpful.
Your insights and experiences could greatly assist in enhancing my understanding and in finding a resolution to these inconsistent results. Thank you in advance for your time and assistance.
Best regards,
Faiss does not L2 normalize either the query, not the database vectors. Have you done this normalization on the vectors yourself before adding them to the index and querying?
Faiss does not L2 normalize either the query, not the database vectors. Have you done this normalization on the vectors yourself before adding them to the index and querying?
Yes, I have indeed applied L2 normalization to the vectors before adding them to the index and querying. My confusion arises from the fact that while some results from the Faiss search align perfectly with those obtained from my own inner product calculations, there are occasional discrepancies. This led me to wonder if Faiss implements any additional steps or calculations that might account for these differences.
Any insights into this would be greatly appreciated.
hi @YoungjaeDev, as @algoriddle mentioned, you need to normalise all vectors before constructing the index and use the inner product metric. Could you provide a toy example where you reproduce the issue so this is actionable?