fuzzball.js icon indicating copy to clipboard operation
fuzzball.js copied to clipboard

SequenceMatcher should have `autojunk` set to `false` for `partial_ratio`

Open wesbarnett opened this issue 10 months ago • 2 comments

Here SequenceMatcher is being called with the default autojunk parameter being inferred as true:

https://github.com/nol13/fuzzball.js/blob/4e7393ca7af47f71abefe6071b3b7d3d82ece4ef/fuzzball.js#L932

This does not match the Python implementation of fuzzywuzzy / rapidfuzz as documented here:

blocks = SequenceMatcher(None, needle, longer, False).get_matching_blocks()
score = 0
for block in blocks:
    long_start = block[1] - block[0] if (block[1] - block[0]) > 0 else 0
    long_end = long_start + len(shorter)
    long_substr = longer[long_start:long_end]
    score = max(score, fuzz.ratio(needle, long_substr))

Alternatively, it could be an option the user could pass.

wesbarnett avatar May 15 '25 14:05 wesbarnett

Ya this could definitely at the very least be added as an option. If that's what their current implementation is doing I can probably keep this in sync and make it default. I think their implementation has probably diverged since this was written when they split from the old codebase? (https://github.com/seatgeek/fuzzywuzzy/blob/master/fuzzywuzzy/fuzz.py#L61)

Edit: Nm I see where they explain this difference here now https://github.com/rapidfuzz/RapidFuzz/blob/main/api_differences.md#partial_ratio-implementation

nol13 avatar May 21 '25 02:05 nol13

Do you happen to have any good test cases for this? Will add the option, but I wanted to better understand the behavior before making it the default. I'm not sure if the difflib port I'm using's autojunk option works exactly like the current Python implementation either.

nol13 avatar Jun 05 '25 16:06 nol13

Tbh I couldn't even find a good test case, but no harm in adding it as an option. Will leave the default to be true for now to not change behavior.

nol13 avatar Aug 15 '25 00:08 nol13

2.2.3 is released so closing but if there was an alternative javascript implementation that was better that could work to

nol13 avatar Aug 20 '25 03:08 nol13