faiss icon indicating copy to clipboard operation
faiss copied to clipboard

Getting Cosine similarity different for "Flat" & "HNSW32Flat" Indexes

Open Kapil-23 opened this issue 3 years ago • 2 comments

Summary

Platform

OS: linux

Faiss version:

Installed from:

Faiss compilation options:

Running on:

  • [x] CPU
  • [ ] GPU

Interface:

  • [x] C++
  • [x] Python

Reproduction instructions

Hello,

I am trying to find the cosine similarity with HNSW. But the cosine similarity found to be incorrect below is the code and comparison of "Flat", "HNSW" & "scipy"

import faiss
emb1 = np.fromfile("emb1.raw", dtype=np.float32)
emb2 = np.fromfile("emb2.raw", dtype=np.float32)

Scipy code & result

from scipy import spatial
result = 1 - spatial.distance.cosine(emb1, emb2)
print('Cosine Similarity by scipy:{}'.format(result))

Result: Cosine Similarity by scipy::0.991761326789856

IndexFlatL2/Flat code & result

xb = np.expand_dims(emb1,axis=0)
xq = np.expand_dims(emb2,axis=0)

index = faiss.index_factory(128, "Flat", faiss.METRIC_INNER_PRODUCT)
index.ntotal
faiss.normalize_L2(xb)
index.add(xb)
faiss.normalize_L2(xq)
distance, index = index.search(xq, 1)
print('[FAISS] Cosine Similarity by Flat:{}'.format(distance))

Result: [FAISS] Cosine Similarity by Flat:[[0.9917611]]

IndexHNSWFlat/HNSW32Flat code & result

xb = np.expand_dims(emb1,axis=0)
xq = np.expand_dims(emb2,axis=0)

index = faiss.index_factory(128, "HNSW32Flat", faiss.METRIC_INNER_PRODUCT)
index.ntotal
faiss.normalize_L2(xb)
index.add(xb)
faiss.normalize_L2(xq)
distance, index = index.search(xq, 1)
print('[FAISS] Cosine Similarity by HNSW32Flat:{}'.format(distance))

Result: [FAISS] Cosine Similarity by HNSW32Flat:[[0.01647742]]

The results of Scipy & Flat are matching. Whereas the result is incorrect for HNSW. Verified the results using C++ & Python API's

Kapil-23 avatar Jul 07 '22 05:07 Kapil-23

This is with an old version of Faiss, HNSW32Flat is not a valid index_factory string, it should be HNSW32,Flat. In addition, the faiss.METRIC_INNER_PRODUCT is not taken into account, so it computes L2 distances. This is fine, it just requires to do the translation to cosine similarity:

2 - 2 * 0.9917611 = 0.0164778

mdouze avatar Jul 08 '22 08:07 mdouze

@mdouze Thanks for your reply !!!

Yes the faiss python version that was installed was (1.5.3) after upgrading to 1.7.2 the issue resolved. Updated the api faiss.index_factory(128, "HNSW32,Flat", faiss.METRIC_INNER_PRODUCT) Correct Result : 0.9917613

Note : Results are direct from API (Not used: 2 - 2 * 0.9917611 = 0.0164778)

With respect to C++ I am facing the same issue of incorrect results (i.e getting Euclidean distance) instead of cosine similarity. I am using the following code. Faiss compiled from repo : latest version

faiss::IndexHNSWFlat index(128,64);
index.metric_type = faiss::METRIC_INNER_PRODUCT;

normalize(xb)
index.add(xb)
normalize(xq)

index.search(...)

Result: -0.0164774

Kapil-23 avatar Jul 08 '22 10:07 Kapil-23