Fast_Sentence_Embeddings
Fast_Sentence_Embeddings copied to clipboard
Returning vectors with similarity above threshold for most_similar()
In sentencevectors.py
most_similar() can return the topn
most similar words. However it would be useful to be able to specify a similarity threshold above which the sentences are returned. For this topn
could take a fractional value and therefore if topn is strictly smaller than 1 then it's considered a threshold and otherwise it works in the same way as it does now.
Yes this is absolutely correct. However, the current implementation is actually highly inefficient in terms of similarty search (brute force). I had plans to include approximate nearest neighbor search, but haven't found time to implement it