Documentation for hir::Repetition is misleading
Consider the following code:
use regex_syntax::hir;
fn main() {
// Is this: xab+y or x(ab)+y?
let xabplusy = hir::Hir::concat(vec![
hir::Hir::literal(hir::Literal::Unicode('x')),
hir::Hir::repetition(hir::Repetition {
kind: hir::RepetitionKind::OneOrMore,
greedy: true,
hir: Box::new(
hir::Hir::concat(vec![
hir::Hir::literal(hir::Literal::Unicode('a')),
hir::Hir::literal(hir::Literal::Unicode('b')),
])
)
}),
hir::Hir::literal(hir::Literal::Unicode('x'))
]);
let regex = xabplusy.to_string();
eprintln!("{}", regex);
}
Running the above code yields: xab+y
The documentation for Repetition says:
hir: Box<Hir>
The expression being repeated.
This leads me to believe that the whole Hir will be repeated (in this case ab). But, at least in the case of a Hir::concat, it appears that only the last Hir in the vector (b) is repeated.
Is this an issue with how Hir is rendered? Looking at Writer::visit_pre https://github.com/rust-lang/regex/blob/04e025b86144bbdf41425fef4a1d06161dc645d7/regex-syntax/src/hir/print.rs#L86-L91 and Writer::visit_post https://github.com/rust-lang/regex/blob/04e025b86144bbdf41425fef4a1d06161dc645d7/regex-syntax/src/hir/print.rs#L173-L180 repetitions should probably be wrapped in parentheses.
It looks like alternations have the same issue:
use regex_syntax::hir;
fn main() {
// Is this: xab+y or x(ab)+y?
let xabplusy = hir::Hir::concat(vec![
hir::Hir::literal(hir::Literal::Unicode('x')),
hir::Hir::repetition(hir::Repetition {
kind: hir::RepetitionKind::OneOrMore,
greedy: true,
hir: Box::new(hir::Hir::concat(vec![
hir::Hir::literal(hir::Literal::Unicode('a')),
hir::Hir::literal(hir::Literal::Unicode('b')),
])),
}),
hir::Hir::alternation(vec![
hir::Hir::concat(vec![
hir::Hir::literal(hir::Literal::Unicode('f')),
hir::Hir::literal(hir::Literal::Unicode('g')),
]),
hir::Hir::literal(hir::Literal::Unicode('h')),
]),
hir::Hir::literal(hir::Literal::Unicode('x')),
]);
let regex = xabplusy.to_string();
eprintln!("{}", regex);
}
yields xab+fg|hx
Yeah I think this is probably a bug in the printer. I don't think this kind of ambiguity is produced when roundtripping from the concrete syntax, because groups (even non-capturing groups) aren't flattened. Arguably they should be via the Hir's smart constructors, which would surface this bug more readily.
This ended up being a duplicate of #516. And a fix is incoming. See https://github.com/rust-lang/regex/issues/516#issuecomment-1230606293 for more details.
Thanks for taking the time to work on this.