fuzzywuzzy
fuzzywuzzy copied to clipboard
Broken partial_ratio functionality with python-Levenshtein
The partial_ratio calculation seems to yield incorrect results for certain combinations of strings when it uses the python-Levenshtein SequenceMatcher.
This works well:
> fuzz.partial_ratio('this is a test', 'is this is a not this is a test!')
> 100
But changing the longer string slightly, while not affecting the target:
> fuzz.partial_ratio('this is a test', 'is this is a not really thing this is a test!')
> 92
Digging deeper, it appears that the get_matching_blocks() method returns substrings that do not actually match the string we are searching for, so the subsequent ratio calculations are performed on a set of poorly-matched ones.
Removing python-Levenshtein and using the python-only SequenceMatcher makes that method perform its job correctly. I couldn't figure out what it was about certain strings that made it break, after trying a whole bunch.
To top it off, the python-Levenshtein library appears to have been left unsupported for a while now. Any ideas? Maybe for now, removing the recommendation to use python-Levenshtein would let code run correctly, if not as fast? Thanks!