fuzzywuzzy icon indicating copy to clipboard operation
fuzzywuzzy copied to clipboard

implemented token_sim_ratio() function with cosine similarity

Open Exquisition opened this issue 3 years ago • 4 comments

Implemented solution to the following issue: https://github.com/seatgeek/fuzzywuzzy/issues/272

token_sim_ratio(s1, s2 ... ) robustly handles any issues associated with lexicographic sorting of tokens for the 2nd string introduced by fuzz.token_sort_ratio(s1, s2...). The similarity is calculated using cosine similarity, other similarity measures could be integrated easily (built-in leveinstein, Jaro-Winkler, etc).

Exquisition avatar Dec 17 '20 21:12 Exquisition