Fast_Sentence_Embeddings Returning vectors with similarity above threshold for most

Returning vectors with similarity above threshold for most_similar()

Open lucas-ubm opened this issue 4 years ago • 1 comments

In sentencevectors.py most_similar() can return the topn most similar words. However it would be useful to be able to specify a similarity threshold above which the sentences are returned. For this topn could take a fractional value and therefore if topn is strictly smaller than 1 then it's considered a threshold and otherwise it works in the same way as it does now.

Aug 24 '20 07:08 lucas-ubm

Yes this is absolutely correct. However, the current implementation is actually highly inefficient in terms of similarty search (brute force). I had plans to include approximate nearest neighbor search, but haven't found time to implement it

Jan 28 '21 08:01 oborchers

Fast_Sentence_Embeddings Fast_Sentence_Embeddings copied to clipboard

Returning vectors with similarity above threshold for most_similar()

Fast_Sentence_Embeddings
Fast_Sentence_Embeddings copied to clipboard