fast-elasticsearch-vector-scoring icon indicating copy to clipboard operation
fast-elasticsearch-vector-scoring copied to clipboard

internal algorithm

Open applecv3 opened this issue 3 years ago • 8 comments

Hi, I just want to know which type of KNN (like HNSW, LSH, and so forth) you built in this plugin.

applecv3 avatar Oct 28 '20 08:10 applecv3

The plug-in uses pure cosine-similarity or dot-product to compare vectors. So the K nearest neighbors it returns are the exact K, not any assessment like LSH and others

On Wed, Oct 28, 2020, 10:17 AM Seung [email protected] wrote:

Hi, I just want to know which type of KNN (like HNSW, LSH, and so forth) you built in this plugin.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lior-k/fast-elasticsearch-vector-scoring/issues/58, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGGISFY5JHKWI2HXMOZ7QLSM7HQJANCNFSM4TB7GUFA .

lior-k avatar Oct 28 '20 12:10 lior-k

Thank you for your answer! So.. let me ask you some more. Do you mean naive KNN searching algorithm by "pure cosine-similarity"? Is it taking O(N) time complexity? (where N is the number of documents to explore when computing cosine similarity). If so, I'm not sure how your plugin works faster than the others and I saw you mentioned that "I gained this substantial speed improvement by using the lucene index directly". Does that imply all the secrets(?) about how this plugin works fast?

applecv3 avatar Oct 29 '20 05:10 applecv3

Yes, it uses brute force to calculate cosine-similarity. Meaning O(n) It is not faster than hnswlib or fasis etc... It is faster then other ES plugins that did the same brute force calculations. The only difference was using the lucene engine. You can see the code :-)

BTW - Amazon has an hnswlib implementation on their manages ES implementation. It should be much faster than this but it has limitations

On Thu, Oct 29, 2020, 7:41 AM Seung [email protected] wrote:

Thank you for your answer! So.. let me ask you some more. Do you mean naive KNN searching algorithm by "pure cosine-similarity"? Is it taking O(N) time complexity? (where N is the number of documents to explore when computing cosine similarity). If so, I'm not sure how your plugin works faster than the others and I saw you mentioned that "I gained this substantial speed improvement by using the lucene index directly". Does that imply all the secrets(?) about how this plugin works fast?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lior-k/fast-elasticsearch-vector-scoring/issues/58#issuecomment-718373084, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGGISDM74QJSGQ6ZL4BG73SND6B3ANCNFSM4TB7GUFA .

lior-k avatar Oct 29 '20 12:10 lior-k

Thank you so much! I really appreciate it. Have a good day!

applecv3 avatar Oct 30 '20 00:10 applecv3

BTW, we use k-means with this plug-in inorder to traverse only the input vector nearest clusters instead of the entire corpus.

On Fri, Oct 30, 2020, 2:21 AM Seung [email protected] wrote:

Thank you so much! I really appreciate it. Have a good day!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lior-k/fast-elasticsearch-vector-scoring/issues/58#issuecomment-719096909, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGGISBAMEYC6S4LXKMQT5TSNIBHVANCNFSM4TB7GUFA .

lior-k avatar Oct 30 '20 21:10 lior-k

@lior-k Hi,

What is the difference between this repo and the native ES vector scoring? Which one is faster?

Thanks

sctrueew avatar Jan 14 '21 15:01 sctrueew

Never tested. This plugin existes way before the official support. If you do test the performance differences please let us all know 🙏

On Thu, Jan 14, 2021, 5:20 PM mz [email protected] wrote:

@lior-k https://github.com/lior-k Hi,

Whats is the difference between this repo and the native ES vector scoring? Which one is faster?

Thanks

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lior-k/fast-elasticsearch-vector-scoring/issues/58#issuecomment-760263975, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGGISAXPRZQ3HYDQMU5E6LSZ4DS3ANCNFSM4TB7GUFA .

lior-k avatar Jan 14 '21 20:01 lior-k

Whether the plug-in can perform algorithm configuration, use brute force to calculate cosine similarity, not suitable for high-efficiency scenarios # @lior-k

Shengwuyou avatar Mar 02 '21 03:03 Shengwuyou