SequenceMatcher should have `autojunk` set to `false` for `partial_ratio`
Here SequenceMatcher is being called with the default autojunk parameter being inferred as true:
https://github.com/nol13/fuzzball.js/blob/4e7393ca7af47f71abefe6071b3b7d3d82ece4ef/fuzzball.js#L932
This does not match the Python implementation of fuzzywuzzy / rapidfuzz as documented here:
blocks = SequenceMatcher(None, needle, longer, False).get_matching_blocks()
score = 0
for block in blocks:
long_start = block[1] - block[0] if (block[1] - block[0]) > 0 else 0
long_end = long_start + len(shorter)
long_substr = longer[long_start:long_end]
score = max(score, fuzz.ratio(needle, long_substr))
Alternatively, it could be an option the user could pass.
Ya this could definitely at the very least be added as an option. If that's what their current implementation is doing I can probably keep this in sync and make it default. I think their implementation has probably diverged since this was written when they split from the old codebase? (https://github.com/seatgeek/fuzzywuzzy/blob/master/fuzzywuzzy/fuzz.py#L61)
Edit: Nm I see where they explain this difference here now https://github.com/rapidfuzz/RapidFuzz/blob/main/api_differences.md#partial_ratio-implementation
Do you happen to have any good test cases for this? Will add the option, but I wanted to better understand the behavior before making it the default. I'm not sure if the difflib port I'm using's autojunk option works exactly like the current Python implementation either.
Tbh I couldn't even find a good test case, but no harm in adding it as an option. Will leave the default to be true for now to not change behavior.
2.2.3 is released so closing but if there was an alternative javascript implementation that was better that could work to