milvus
milvus copied to clipboard
[Feature]: Support cosine similarity
Is there an existing issue for this?
- [X] I have searched the existing issues
Is your feature request related to a problem? Please describe.
a lot of NLP users want to use cosine similarity.
it's actually nothing but faiss.normalize_L2() + IP distance
Describe the solution you'd like.
No response
Describe an alternate solution.
No response
Anything else? (Additional Context)
No response
@xiaofan-luan hey I would like to give it a shot. Could you point me where to start?
@xiaofan-luan hey I would like to give it a shot. Could you point me where to start?
hi amit,
- To begin with you need to understand the difference between cosine, l2 and Inner product metrics difference. We support l2 and Inner product for now (https://www.milvus.io/docs/metric.md)
- to build a index, you will specify index_params = { "metric_type":"L2", "index_type":"IVF_FLAT", "params":{"nlist":1024} }, and now we want to add another type cosine.
- To support cosine, you need to change the core of milvus(because IP and Cosine are almost the same except for Cosine is for normalized vectors), but it's not as hard. just check where ever the IP distance is metrics.
- Inside knowhere, which is engine of milvus, you need a way to calculate cosine distance metrics, I think @liliu-z is working on it already
@xiaofan-luan Thank you I will take a look
Hi @amityahav , Yes we are working this, The vector index related work of Mivuls is in https://github.com/milvus-io/knowhere, and more than welcome for any contributions and questions.
/assign @liliu-z
@liliu-z Hi, is there any update on this issue? I'd like to work on it.
I
@liliu-z Hi, is there any update on this issue? I'd like to work on it.
I thought @cydrain is working on it, any progress?
@xiaofan-luan Thanks for the quick reply. I look around and find some COSINE related codes in knowhere. So it probably has been done.
BTW, I'm looking for opportunities to contribute to open source projects and find Milvus is an excellent candidate. If there is any good first issue pending, I will be happy to help. I'm a senior MLE and quite familiar with ML and embeddings. Thanks!
I think this issue can be closed right? If the vectors are normalized (i.e. with magnitude of 1), the dot product is the exactly the same as cosine similarity because cosine(x) = inner product / product of magnitude
hi @garyhlai, so as you said, i can use IP in the search param config if I want to calculate cosine distance when my vector is normalized to [0,1], right ?
hi @garyhlai, so as you said, i can use IP in the search param config if I want to calculate cosine distance when my vector is normalized to [0,1], right ?
use 2.3.1 so you can directly use cosine metrics
@xiaofan-luan I'm currently using v2.2.x, including both pymilvus and Milvus service docker running, should I upgrade both or just need to modify pymilvus ? Or I think I just need to modify param to IP while I don't want to upgrade the version, cuz my embedding vector is normalized to (0,1).
use ip distance and normalize your vectors, you will get samilar disatance as cosine
How much the cosine similarity is less than , it will not be returned? 0.5? 0? ,Can I modify it?
you can normalize your vector so IP is as same as consine metrics. max distance will be 1
you can normalize your vector so IP is as same as consine metrics. max distance will be 1
OK,thank you