Order of `or` parser affects parse success.
I noticed a case where changing the order of parsers combined with an or affects whether an expression can be parsed. This is on version 0.10.1.
From what I understood based on the documentation, if either parser can parse the input then the or should succeed.
I was able to reduce my code to this example. Note the comment BUG: for the lines that show the buggy behavior.
#[derive(Debug, Clone)]
pub enum Val {
Integer(i64),
}
#[derive(Debug, Clone)]
pub enum Exp {
Val(Val),
Neg(Box<Exp>),
Mul(Box<Exp>, Box<Exp>),
Div(Box<Exp>, Box<Exp>),
}
fn parse_order_bug_parser<'a>() -> impl Parser<'a, &'a str, Exp> {
let expr = recursive(|expr| {
let int = text::int(10)
.map(|s: &str| Exp::Val(Val::Integer(s.parse().unwrap())))
.padded();
let atom =
int
.or(expr.clone().delimited_by(just('('), just(')')));
let op = |c| just(c).padded();
let unary = op('-')
.repeated()
.foldr(atom.clone(), |_op, rhs| Exp::Neg(Box::new(rhs)));
let product = atom.clone()
.foldl(
op('*').to(Exp::Mul as fn(_, _) -> _)
.or(op('/').to(Exp::Div as fn(_, _) -> _))
.then(atom.clone()).repeated(),
|lhs, (op, rhs)| op(Box::new(lhs), Box::new(rhs))
);
// BUG: This order fails.
unary.or(product)
// However, this order succeeds.
//product.or(unary)
});
expr.then_ignore(end())
}
#[test]
fn parse_order_bug() {
let result = parse_order_bug_parser().parse("5 * (----5)");
println!("{:?}", result);
result.unwrap();
}
The unary parser will always succeed when it sees the first character (5), because it is a valid unary expression.
You're probably looking to write a precedence climbing parser, in which case your product parser should invoke your unary parser.
Additionally, .then_ignore(end()) is no longer required as of 0.10 :)
Thanks for the help! I'm not sure I understand your response.
The string I'm trying to parse is 5 * (----5) which can only be a product.
If I parse this using unary.or(product) the result is a failure which is strange. I expect the unary part to fail because of the * character. Then the product part should succeed and look like Mul(Value(5), Neg(Neg(Neg(Neg(5))))). The final result should be success.
If I parse this using the line product.or(unary) then the result is success as expected.
Would we ever expect a parser a.or(b) to fail/succeed differently from b.or(a)?
The problem you have here is that your input is not just a product: 5 is, by itself, a perfectly valid unary too. So the parser sees that, decides it's found the correct solution, and then continues onward. Usually when parsing operators like this, you nest them together according to precedence (or use the pratt combinator if you want an easier time) by having your product parser look for unary as its operands rather than having a big or chain at the bottom with ambiguous precedence.
Try this:
#[derive(Debug, Clone)]
pub enum Val {
Integer(i64),
}
#[derive(Debug, Clone)]
pub enum Exp {
Val(Val),
Neg(Box<Exp>),
Mul(Box<Exp>, Box<Exp>),
Div(Box<Exp>, Box<Exp>),
}
fn parse_order_bug_parser<'a>() -> impl Parser<'a, &'a str, Exp> {
let expr = recursive(|expr| {
let int = text::int(10)
.map(|s: &str| Exp::Val(Val::Integer(s.parse().unwrap())))
.padded();
let atom =
int
.or(expr.clone().delimited_by(just('('), just(')')));
let op = |c| just(c).padded();
let unary = op('-')
.repeated()
.foldr(atom.clone(), |_op, rhs| Exp::Neg(Box::new(rhs)));
let product = unary.clone()
.foldl(
op('*').to(Exp::Mul as fn(_, _) -> _)
.or(op('/').to(Exp::Div as fn(_, _) -> _))
.then(unary.clone()).repeated(),
|lhs, (op, rhs)| op(Box::new(lhs), Box::new(rhs))
);
product
});
expr
}
#[test]
fn parse_order_bug() {
let result = parse_order_bug_parser().parse("5 * (----5)");
println!("{:?}", result);
result.unwrap();
}
I see what you're saying. I guess I misunderstood the docs. I'm referring to the docs here where it says Parse one thing or, on failure, another thing.
So what I thought that meant is that if the order is unary.or(product) the parser will first attempt unary. It will see 5 and then as you say "decides it's found the correct solution, and then continues onward." It will now see spurious characters and then fail. The or(product) part will then get evaluated and succeed.
So I guess it is more correct to understand that the or() is treating two kinds of failures differently:
- If the failure is a character that doesn't match the sub-parser (
unaryin this case) then the lhs will fail and trigger the rhs. - If the failure is from spurious characters at the end of the input then the lhs will succeed, but the larger
.or()clause will fail.
Is that right?
Parsers are lazy when evaluating sections of input. unary will succeed on 5, and then it'll pass control on to whatever follows. Unfortunately, in your original parser, nothing follows unary - so the parser will generate an error when it doesn't find the end of input.