fuzzysearch icon indicating copy to clipboard operation
fuzzysearch copied to clipboard

Unexpected behavior with only insertions allowed

Open artemuk opened this issue 5 years ago • 2 comments

Hi! I'm using fuzzysearch v0.7.1 and I got a bit unexpected results when only insertions are allowed. Here is the code:

matches = find_near_matches('12345', '123aa', max_deletions=0, max_substitutions=0, max_insertions=3)
print(matches)

Expected output:

[Match(start=0, end=3, dist=2, matched='123')]

I expected that substring '123' can be matched since we can insert '4' and '5' after it

Actual output

[]

But actually it doesn't return any matches.

Please tell me, am I missing something? Is this an intended behavior or a bug? Thank you in advance!

OS: Ubuntu 19.04 Setup: Python 3.7 (miniconda), fuzzysearch 0.7.1

artemuk avatar Apr 22 '20 17:04 artemuk

Hi @artemuk,

Thank you for such a clear and thorough description of the issue!

fuzzysearch uses the opposite meaning of what insertion/deletion means: It defines them relative to the searched sub-string. In your case, '12345' can be turned into '123' using 2 deletions, so allowing deletions rather than insertions gives the expected result:

>>> find_near_matches('12345', '123aa', max_deletions=3, max_substitutions=0, max_insertions=0)
[Match(start=0, end=3, dist=2, matched='123')]

In other words, find_near_matches() finds parts of the searched sequence, which can be arrived at from the searched string (e.g. '12345') by making the manipulations allowed by the search parameters.

The docs currently don't mention this at all, unfortunately, and they certainly should! I'll add something about this to the docs soon.

taleinat avatar Apr 22 '20 22:04 taleinat

Hi @taleinat,

Thank you very much for the explanation! It makes perfect sense now. I should've figured it out myself since it's actually mentioned in the docstring (now I see it :)) But yea, a bit more explicit mention of it will definitely do good!

artemuk avatar Apr 23 '20 00:04 artemuk