ugrep icon indicating copy to clipboard operation
ugrep copied to clipboard

Incorrect match in fuzzy search ?

Open blinard-BIOINFO opened this issue 3 years ago • 0 comments

I'm trying to get byte offset of all matches in fuzzy mode, allowing only substitutions (not insertions or deletions). I'm using ugrep 3.8.3 x86_64-pc-linux-gnu +avx2 +pcre2_jit +zlib +bzip2 +lzma.

Here is the command with a simple short text and pattern.

echo agtagatgatagatagt | ugrep --byte-offset --ungroup --fuzzy=~1 tag

These matches are returned (I enclosed each match between | , as they are coloured in a normal terminal) :

2:ag|tag|atgatagatagt
6+agtaga|tg|atagatagt
9+agtagatga|tag|atagt
13+agtagatgataga|tag|t

The 2nd offset is incorrect, it shows a deletion while I was requesting only substitutions. Is this a bug or I am I doing something wrong ?

blinard-BIOINFO avatar Aug 01 '22 11:08 blinard-BIOINFO

Thank you for the feedback. Interesting observation. I am taking a closer look. This may take a bit of time to run logs, diagnostics and tests.

genivia-inc avatar Aug 12 '22 18:08 genivia-inc

Fixed the problem:

$ echo agtagatgatagatagt | ugrep --tag='<<,>>' --byte-offset --ungroup --fuzzy=~1 tag
2:ag<<tag>>atgatagatagt
9+agtagatga<<tag>>atagt
13+agtagatgataga<<tag>>t

The problem was a branch in the fuzzymatcher that assumed that insertions were allowed in combination with substitutions while backtracking, even when the latter is only active.

genivia-inc avatar Aug 14 '22 20:08 genivia-inc

I will release update v3.9.1 very soon. Thanks for reporting this.

genivia-inc avatar Aug 14 '22 21:08 genivia-inc