milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Feature]: Support cosine similarity

Open xiaofan-luan opened this issue 2 years ago • 5 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Is your feature request related to a problem? Please describe.

a lot of NLP users want to use cosine similarity.

it's actually nothing but faiss.normalize_L2() + IP distance

Describe the solution you'd like.

No response

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

xiaofan-luan avatar Apr 03 '23 16:04 xiaofan-luan

@xiaofan-luan hey I would like to give it a shot. Could you point me where to start?

amityahav avatar Apr 11 '23 17:04 amityahav

@xiaofan-luan hey I would like to give it a shot. Could you point me where to start?

hi amit,

  1. To begin with you need to understand the difference between cosine, l2 and Inner product metrics difference. We support l2 and Inner product for now (https://www.milvus.io/docs/metric.md)
  2. to build a index, you will specify index_params = { "metric_type":"L2", "index_type":"IVF_FLAT", "params":{"nlist":1024} }, and now we want to add another type cosine.
  3. To support cosine, you need to change the core of milvus(because IP and Cosine are almost the same except for Cosine is for normalized vectors), but it's not as hard. just check where ever the IP distance is metrics.
  4. Inside knowhere, which is engine of milvus, you need a way to calculate cosine distance metrics, I think @liliu-z is working on it already

xiaofan-luan avatar Apr 11 '23 18:04 xiaofan-luan

@xiaofan-luan Thank you I will take a look

amityahav avatar Apr 11 '23 19:04 amityahav

Hi @amityahav , Yes we are working this, The vector index related work of Mivuls is in https://github.com/milvus-io/knowhere, and more than welcome for any contributions and questions.

liliu-z avatar Apr 12 '23 03:04 liliu-z

/assign @liliu-z

liliu-z avatar Apr 12 '23 03:04 liliu-z

@liliu-z Hi, is there any update on this issue? I'd like to work on it.

caesarjuly avatar May 04 '23 00:05 caesarjuly

I

@liliu-z Hi, is there any update on this issue? I'd like to work on it.

I thought @cydrain is working on it, any progress?

xiaofan-luan avatar May 04 '23 02:05 xiaofan-luan

@xiaofan-luan Thanks for the quick reply. I look around and find some COSINE related codes in knowhere. So it probably has been done. BTW, I'm looking for opportunities to contribute to open source projects and find Milvus is an excellent candidate. If there is any good first issue pending, I will be happy to help. I'm a senior MLE and quite familiar with ML and embeddings. Thanks!

caesarjuly avatar May 04 '23 02:05 caesarjuly

I think this issue can be closed right? If the vectors are normalized (i.e. with magnitude of 1), the dot product is the exactly the same as cosine similarity because cosine(x) = inner product / product of magnitude

garyhlai avatar May 09 '23 21:05 garyhlai

hi @garyhlai, so as you said, i can use IP in the search param config if I want to calculate cosine distance when my vector is normalized to [0,1], right ?

duongktr avatar Oct 03 '23 04:10 duongktr

hi @garyhlai, so as you said, i can use IP in the search param config if I want to calculate cosine distance when my vector is normalized to [0,1], right ?

use 2.3.1 so you can directly use cosine metrics

xiaofan-luan avatar Oct 03 '23 14:10 xiaofan-luan

@xiaofan-luan I'm currently using v2.2.x, including both pymilvus and Milvus service docker running, should I upgrade both or just need to modify pymilvus ? Or I think I just need to modify param to IP while I don't want to upgrade the version, cuz my embedding vector is normalized to (0,1).

duongktr avatar Oct 04 '23 03:10 duongktr

use ip distance and normalize your vectors, you will get samilar disatance as cosine

xiaofan-luan avatar Oct 04 '23 07:10 xiaofan-luan

How much the cosine similarity is less than , it will not be returned? 0.5? 0? ,Can I modify it?

YangZhaoo avatar Dec 04 '23 08:12 YangZhaoo

you can normalize your vector so IP is as same as consine metrics. max distance will be 1

xiaofan-luan avatar Dec 04 '23 13:12 xiaofan-luan

you can normalize your vector so IP is as same as consine metrics. max distance will be 1

OK,thank you

YangZhaoo avatar Dec 04 '23 14:12 YangZhaoo