Incorrect match in fuzzy search ?
I'm trying to get byte offset of all matches in fuzzy mode, allowing only substitutions (not insertions or deletions). I'm using ugrep 3.8.3 x86_64-pc-linux-gnu +avx2 +pcre2_jit +zlib +bzip2 +lzma.
Here is the command with a simple short text and pattern.
echo agtagatgatagatagt | ugrep --byte-offset --ungroup --fuzzy=~1 tag
These matches are returned (I enclosed each match between | , as they are coloured in a normal terminal) :
2:ag|tag|atgatagatagt
6+agtaga|tg|atagatagt
9+agtagatga|tag|atagt
13+agtagatgataga|tag|t
The 2nd offset is incorrect, it shows a deletion while I was requesting only substitutions. Is this a bug or I am I doing something wrong ?
Thank you for the feedback. Interesting observation. I am taking a closer look. This may take a bit of time to run logs, diagnostics and tests.
Fixed the problem:
$ echo agtagatgatagatagt | ugrep --tag='<<,>>' --byte-offset --ungroup --fuzzy=~1 tag
2:ag<<tag>>atgatagatagt
9+agtagatga<<tag>>atagt
13+agtagatgataga<<tag>>t
The problem was a branch in the fuzzymatcher that assumed that insertions were allowed in combination with substitutions while backtracking, even when the latter is only active.
I will release update v3.9.1 very soon. Thanks for reporting this.