fuzzywuzzy
fuzzywuzzy copied to clipboard
Implemented sort order matches by common letter count largest to smallest
This pull request addresses the problem in this issue: https://github.com/seatgeek/fuzzywuzzy/issues/280
All the code changes + unit tests are in process.py and test_fuzzywuzzy.py.
All the old and new test cases in test_fuzzywuzzy.py are passed.
Overall I am personally not really convinced, this should be added at all for two reasons
- it adds more arguments, which makes it increasingly hard to use the function in the correct way
- In my opinion this does not belong into process.*. When using token_set_ratio these results are indeed all as similar. So taking the first is a well defined behaviour. When the user wants to prefer matches that have e.g. many characters in common he should use a different scorer, that combines the result of multiple string metrics. A good example for this is fuzz.WRatio, that is implemented as a separate scorer, that combines multiple metrics.