fuzzysearch icon indicating copy to clipboard operation
fuzzysearch copied to clipboard

Near matches get lost with increasing values of max_l_dist

Open davidefiocco opened this issue 4 years ago • 1 comments

To reproduce I am using fuzzysearch==0.7.3 and running

text = "foo bar spam eggs "
query = "four"

with max_l_dist=2 I get one match with

fuzzysearch.find_near_matches(query, text, max_l_dist=2)
[Match(start=0, end=4, dist=2, matched='foo ')]

with max_l_dist=3 I get the previous one with an additional one

fuzzysearch.find_near_matches(query, text, max_l_dist=3)
[Match(start=0, end=4, dist=2, matched='foo '),
 Match(start=6, end=7, dist=3, matched='r')]

but with max_l_dist=4 I fail to get previous ones.

fuzzysearch.find_near_matches(query, text, max_l_dist=4)
[Match(start=0, end=0, dist=4, matched=''),
 Match(start=1, end=1, dist=4, matched=''),
 Match(start=2, end=2, dist=4, matched=''),
 Match(start=3, end=3, dist=4, matched=''),
 Match(start=4, end=4, dist=4, matched=''),
 Match(start=5, end=5, dist=4, matched=''),
 Match(start=6, end=6, dist=4, matched=''),
 Match(start=7, end=7, dist=4, matched=''),
 Match(start=8, end=8, dist=4, matched=''),
 Match(start=9, end=9, dist=4, matched=''),
 Match(start=10, end=10, dist=4, matched=''),
 Match(start=11, end=11, dist=4, matched=''),
 Match(start=12, end=12, dist=4, matched=''),
 Match(start=13, end=13, dist=4, matched=''),
 Match(start=14, end=14, dist=4, matched=''),
 Match(start=15, end=15, dist=4, matched=''),
 Match(start=16, end=16, dist=4, matched=''),
 Match(start=17, end=17, dist=4, matched=''),
 Match(start=18, end=18, dist=4, matched='')]

Is this intended behaviour?

davidefiocco avatar Dec 23 '21 21:12 davidefiocco

Hi @davidefiocco, apologies for the late response.

Yes, this is currently the intended behavior.

The reason is that once the maximum distance is equal to (or greater than) the length of what you're searching for (query in your example), even an empty string is a valid match.

However, looking at your example, I can see that this behavior isn't great: There are matches with a lower distance in the text, but these are no longer returned when the max. distance is too large.

I'll think about how this can be improved without complicating things.

taleinat avatar Mar 09 '22 13:03 taleinat