fuzzywuzzy
fuzzywuzzy copied to clipboard
implemented token_sim_ratio() function with cosine similarity
Implemented solution to the following issue: https://github.com/seatgeek/fuzzywuzzy/issues/272
token_sim_ratio(s1, s2 ... ) robustly handles any issues associated with lexicographic sorting of tokens for the 2nd string introduced by fuzz.token_sort_ratio(s1, s2...). The similarity is calculated using cosine similarity, other similarity measures could be integrated easily (built-in leveinstein, Jaro-Winkler, etc).