hypergrep icon indicating copy to clipboard operation
hypergrep copied to clipboard

Providing Byte-Offsets for Every Match

Open fabianovasi opened this issue 8 months ago • 0 comments

Feature Request Description:

Hello, I've noticed that if matches overlap, byte-offsets are only provided for the beginning of the matched part. As a result, the number of matches obtained with --count-matches flag is larger than the number of obtained byte-offsets with the -o -b flags. I suggest the addition of a new option or modification to existing options that allows users to obtain byte-offsets for every match, even when matches overlap.

Providing all byte-offsets for overlapping matches directly would streamline workflows which require byte-offsets for all matches.

Steps to Reproduce:

Text in a.txt: "012a34" Pattern: "\p{N}{2}" Use the regular expression to search for matches in a.txt:

hg -e "\p{N}{2}" -b -o a.txt

Result:

The number of matches obtained with the --count-matches flag is 3. It would be nice to be able to also obtain three byte-offsets (0,1 and 4 in this example).

Thank you for considering this feature request. I appreciate your work for enabling regex pattern searches with Hyperscan.

Notice: I edited this issue since I realized the matching mechanism is working with a sliding window.

fabianovasi avatar Oct 10 '23 22:10 fabianovasi