faiss icon indicating copy to clipboard operation
faiss copied to clipboard

Clarification Needed: Differences Between Faiss IndexFlatIP Search Results and Cosine Similarity

Open YoungjaeDev opened this issue 1 year ago • 2 comments
trafficstars

Summary

Platform

OS: Ubuntu20

Faiss version: lastest

Installed from: sourec build

Faiss compilation options:

Running on:

  • [v] CPU
  • [v] GPU

Interface:

  • [ ] C++
  • [v] Python

Reproduction instructions

I am reaching out with a query regarding some inconsistencies I've encountered while using Faiss for indexing and search operations. My primary concern revolves around the discrepancy between results obtained from a search function on an IndexFlatIP index and those calculated using cosine similarity (both scipy and scikit-learn) on the same dataset. Values ​​from faiss search have a higher matching rate

Specifically, there are occasional mismatches between the distance values (D) from cosine_similarity(np.expand_dims(face_embedding, axis=0), index_np) and those obtained from index.search. While these discrepancies aren't constant, they are noticeable in certain instances.

I am curious about how Faiss handles distance calculations and whether there is any additional preprocessing applied to feature vectors post L2-normalization within Faiss. Any clarification or additional information on this matter would be immensely helpful.

Your insights and experiences could greatly assist in enhancing my understanding and in finding a resolution to these inconsistent results. Thank you in advance for your time and assistance.

Best regards,

YoungjaeDev avatar Jan 04 '24 02:01 YoungjaeDev

Faiss does not L2 normalize either the query, not the database vectors. Have you done this normalization on the vectors yourself before adding them to the index and querying?

algoriddle avatar Jan 08 '24 14:01 algoriddle

Faiss does not L2 normalize either the query, not the database vectors. Have you done this normalization on the vectors yourself before adding them to the index and querying?

Yes, I have indeed applied L2 normalization to the vectors before adding them to the index and querying. My confusion arises from the fact that while some results from the Faiss search align perfectly with those obtained from my own inner product calculations, there are occasional discrepancies. This led me to wonder if Faiss implements any additional steps or calculations that might account for these differences.

Any insights into this would be greatly appreciated.

YoungjaeDev avatar Jan 08 '24 14:01 YoungjaeDev

hi @YoungjaeDev, as @algoriddle mentioned, you need to normalise all vectors before constructing the index and use the inner product metric. Could you provide a toy example where you reproduce the issue so this is actionable?

mlomeli1 avatar Jun 19 '24 14:06 mlomeli1