fuzzysearch icon indicating copy to clipboard operation
fuzzysearch copied to clipboard

find_near_matches() uses substitution for distance calculation even if max_substitutions=0 is set.

Open markussteindl opened this issue 2 years ago • 1 comments

Reproduction:

find_near_matches('Hello world', 'Hello babab', max_substitutions=0, max_l_dist=5)
# [Match(start=0, end=11, dist=5, matched='Hello babab')]

Is this intended behavior? Without substitution, the distance should be 10 and not 5. Thus the above call should not return any matches.

markussteindl avatar Jun 18 '22 13:06 markussteindl

Hey @Stonatus,

This is currently the intended behavior, yes. The "dist" attribute of matches describes the Levenstein / edit distance, which is indeed 5 in this case.

I can see that it could be useful to see the number of allowed changes needed given the input parameters. I'll leave this open while I consider if there's a neat way to implement this given the existing design.

taleinat avatar Jul 14 '22 18:07 taleinat