Getting Cosine similarity different for "Flat" & "HNSW32Flat" Indexes
Summary
Platform
OS: linux
Faiss version:
Installed from:
Faiss compilation options:
Running on:
- [x] CPU
- [ ] GPU
Interface:
- [x] C++
- [x] Python
Reproduction instructions
Hello,
I am trying to find the cosine similarity with HNSW. But the cosine similarity found to be incorrect below is the code and comparison of "Flat", "HNSW" & "scipy"
import faiss
emb1 = np.fromfile("emb1.raw", dtype=np.float32)
emb2 = np.fromfile("emb2.raw", dtype=np.float32)
Scipy code & result
from scipy import spatial
result = 1 - spatial.distance.cosine(emb1, emb2)
print('Cosine Similarity by scipy:{}'.format(result))
Result:
Cosine Similarity by scipy::0.991761326789856
IndexFlatL2/Flat code & result
xb = np.expand_dims(emb1,axis=0)
xq = np.expand_dims(emb2,axis=0)
index = faiss.index_factory(128, "Flat", faiss.METRIC_INNER_PRODUCT)
index.ntotal
faiss.normalize_L2(xb)
index.add(xb)
faiss.normalize_L2(xq)
distance, index = index.search(xq, 1)
print('[FAISS] Cosine Similarity by Flat:{}'.format(distance))
Result:
[FAISS] Cosine Similarity by Flat:[[0.9917611]]
IndexHNSWFlat/HNSW32Flat code & result
xb = np.expand_dims(emb1,axis=0)
xq = np.expand_dims(emb2,axis=0)
index = faiss.index_factory(128, "HNSW32Flat", faiss.METRIC_INNER_PRODUCT)
index.ntotal
faiss.normalize_L2(xb)
index.add(xb)
faiss.normalize_L2(xq)
distance, index = index.search(xq, 1)
print('[FAISS] Cosine Similarity by HNSW32Flat:{}'.format(distance))
Result:
[FAISS] Cosine Similarity by HNSW32Flat:[[0.01647742]]
The results of Scipy & Flat are matching. Whereas the result is incorrect for HNSW. Verified the results using C++ & Python API's
This is with an old version of Faiss, HNSW32Flat is not a valid index_factory string, it should be HNSW32,Flat. In addition, the faiss.METRIC_INNER_PRODUCT is not taken into account, so it computes L2 distances. This is fine, it just requires to do the translation to cosine similarity:
2 - 2 * 0.9917611 = 0.0164778
@mdouze Thanks for your reply !!!
Yes the faiss python version that was installed was (1.5.3) after upgrading to 1.7.2 the issue resolved.
Updated the api
faiss.index_factory(128, "HNSW32,Flat", faiss.METRIC_INNER_PRODUCT)
Correct Result : 0.9917613
Note : Results are direct from API (Not used: 2 - 2 * 0.9917611 = 0.0164778)
With respect to C++ I am facing the same issue of incorrect results (i.e getting Euclidean distance) instead of cosine similarity. I am using the following code. Faiss compiled from repo : latest version
faiss::IndexHNSWFlat index(128,64);
index.metric_type = faiss::METRIC_INNER_PRODUCT;
normalize(xb)
index.add(xb)
normalize(xq)
index.search(...)
Result: -0.0164774