coreutils
coreutils copied to clipboard
Replace Oniguruma
Currently we use Oniguruma for expr because it has support for the GNU regex syntax. Ideally, we should be using something like Rust regex, but AFAICT this involves either converting the input string to make it conform to the correct syntax or forking regex-syntax so it supports GNU regexes.
By design, the regex crate lacks lookaround and back-references:
Its syntax is similar to Perl-style regular expressions, but lacks a few features like look around and backreferences. In exchange, all searches execute in linear time with respect to the size of the regular expression and search text.
The fancy-regex crate supports those; I don't know how close it is to GNU syntax.
Ah, I forgot that GNU BRE has backreferences. I'm a little wary of using fancy-regex because it has practically no documentation and is not updated very frequently.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Reopening this because I still want to replace onig. The fancy-regex crate looks pretty good now, I think. It does not necessarily need to be updated that much, because it also uses the standard regex under the hood. I'm honestly also fine with exploring whether we can use the regex crate and see whether anyone really relies on the fancy features, but that's maybe a bit risky.
Reason for my renewed interest is that onig takes a long time to build:
I have a plan for this: we could depend on the regex-syntax crate and do our own parsing, because expr requires a very specific syntax for regexes. I don't think this would be too difficult.
I've asked for advice on the regex repository: https://github.com/rust-lang/regex/discussions/1126 (no reply yet, I just posted this)