fuzzywuzzy icon indicating copy to clipboard operation
fuzzywuzzy copied to clipboard

Broken partial_ratio functionality with python-Levenshtein

Open BorisVa opened this issue 9 years ago • 10 comments

The partial_ratio calculation seems to yield incorrect results for certain combinations of strings when it uses the python-Levenshtein SequenceMatcher.

This works well:

> fuzz.partial_ratio('this is a test', 'is this is a not this is a test!')
> 100

But changing the longer string slightly, while not affecting the target:

> fuzz.partial_ratio('this is a test', 'is this is a not really thing this is a test!') 
> 92

Digging deeper, it appears that the get_matching_blocks() method returns substrings that do not actually match the string we are searching for, so the subsequent ratio calculations are performed on a set of poorly-matched ones.

Removing python-Levenshtein and using the python-only SequenceMatcher makes that method perform its job correctly. I couldn't figure out what it was about certain strings that made it break, after trying a whole bunch.

To top it off, the python-Levenshtein library appears to have been left unsupported for a while now. Any ideas? Maybe for now, removing the recommendation to use python-Levenshtein would let code run correctly, if not as fast? Thanks!

BorisVa avatar Feb 23 '15 23:02 BorisVa