vis icon indicating copy to clipboard operation
vis copied to clipboard

Allow regexes to contain ␀ byte

Open lluchs opened this issue 7 years ago • 6 comments

As previously noted in #359, vis has some trouble with searching in binary files. I noticed that the optional TRE library does actually support NUL bytes in the regex, and started modifying vis to make use of that.

With this patch, it's possible to search for NUL bytes by yanking such a pattern into the "/ register and then using n/N. It's not possible yet to enter a pattern containing NUL bytes directly, as the command line works with zero-terminated strings. Actually entering the pattern works fine, but it discards everything after the NUL byte. I haven't figured out how to change that yet.

lluchs avatar Jun 19 '18 20:06 lluchs

@rnpnr I really don’t know what to do with this one. Do we really care about binary files in vis?

mcepl avatar May 28 '24 19:05 mcepl

Do we really care about binary files in vis?

I'm not sure. It definitely doesn't really work right now. As for this patch its not really intrusive and any time I can use strings with a length instead of the NUL terminated mistake I prefer it.

Actually entering the pattern works fine, but it discards everything after the NUL byte. I haven't figured out how to change that yet.

I think you probably need to use tre_regnexec() instead of tre_regexec() when you actually go to perform the match. That also means that the length needs to be specified but that shouldn't be a problem. I would also prefer that this patch was modified so that tre_regncomp() is always used in text_regex_compile() and therefore the length is always required.

@lluchs if you are still interested in this I will look at any updates otherwise someone else can take this over. I don't think any of these changes should be too complicated.

rnpnr avatar May 30 '24 12:05 rnpnr

Thanks for having a look at this PR! I rebased it to current master.

I think you probably need to use tre_regnexec() instead of tre_regexec() when you actually go to perform the match.

text_regex_match is actually only used exactly once for a null-terminated string here: https://github.com/martanne/vis/blob/a7aac1044856abc4d1f133c6563fc604d7fe6295/sam.c#L1590

The actual matching in the text is already null-byte-safe (see str_next_char).

I would also prefer that this patch was modified so that tre_regncomp() is always used in text_regex_compile() and therefore the length is always required.

I did that now and managed to make / commands with null bytes work. With that, I think everything works correctly now.

lluchs avatar Jun 14 '24 18:06 lluchs