Fast_Sentence_Embeddings icon indicating copy to clipboard operation
Fast_Sentence_Embeddings copied to clipboard

Returning vectors with similarity above threshold for most_similar()

Open lucas-ubm opened this issue 3 years ago • 1 comments

In sentencevectors.py most_similar() can return the topn most similar words. However it would be useful to be able to specify a similarity threshold above which the sentences are returned. For this topn could take a fractional value and therefore if topn is strictly smaller than 1 then it's considered a threshold and otherwise it works in the same way as it does now.

lucas-ubm avatar Aug 24 '20 07:08 lucas-ubm

Yes this is absolutely correct. However, the current implementation is actually highly inefficient in terms of similarty search (brute force). I had plans to include approximate nearest neighbor search, but haven't found time to implement it

oborchers avatar Jan 28 '21 08:01 oborchers