chumsky icon indicating copy to clipboard operation
chumsky copied to clipboard

Order of `or` parser affects parse success.

Open mohaque0 opened this issue 7 months ago • 6 comments

I noticed a case where changing the order of parsers combined with an or affects whether an expression can be parsed. This is on version 0.10.1.

From what I understood based on the documentation, if either parser can parse the input then the or should succeed.

I was able to reduce my code to this example. Note the comment BUG: for the lines that show the buggy behavior.

    #[derive(Debug, Clone)]
    pub enum Val {
        Integer(i64),
    }
    
    #[derive(Debug, Clone)]
    pub enum Exp {
        Val(Val),
    
        Neg(Box<Exp>),
        Mul(Box<Exp>, Box<Exp>),
        Div(Box<Exp>, Box<Exp>),
    }

    fn parse_order_bug_parser<'a>() -> impl Parser<'a, &'a str, Exp> {
        let expr = recursive(|expr| {
            let int = text::int(10)
                .map(|s: &str| Exp::Val(Val::Integer(s.parse().unwrap())))
                .padded();

            let atom = 
                    int
                    .or(expr.clone().delimited_by(just('('), just(')')));

            let op = |c| just(c).padded();

            let unary = op('-')
                .repeated()
                .foldr(atom.clone(), |_op, rhs| Exp::Neg(Box::new(rhs)));

            let product = atom.clone()
                .foldl(
                    op('*').to(Exp::Mul as fn(_, _) -> _)
                        .or(op('/').to(Exp::Div as fn(_, _) -> _))
                        .then(atom.clone()).repeated(),
                    |lhs, (op, rhs)| op(Box::new(lhs), Box::new(rhs))
                );

            // BUG: This order fails.
            unary.or(product)

            // However, this order succeeds.
            //product.or(unary)
        });

        expr.then_ignore(end())
    }

    #[test]
    fn parse_order_bug() {
        let result = parse_order_bug_parser().parse("5 * (----5)");

        println!("{:?}", result);
        result.unwrap();
    }

mohaque0 avatar May 23 '25 03:05 mohaque0

The unary parser will always succeed when it sees the first character (5), because it is a valid unary expression.

You're probably looking to write a precedence climbing parser, in which case your product parser should invoke your unary parser.

Additionally, .then_ignore(end()) is no longer required as of 0.10 :)

zesterer avatar May 23 '25 20:05 zesterer

Thanks for the help! I'm not sure I understand your response.

The string I'm trying to parse is 5 * (----5) which can only be a product.

If I parse this using unary.or(product) the result is a failure which is strange. I expect the unary part to fail because of the * character. Then the product part should succeed and look like Mul(Value(5), Neg(Neg(Neg(Neg(5))))). The final result should be success.

If I parse this using the line product.or(unary) then the result is success as expected.

Would we ever expect a parser a.or(b) to fail/succeed differently from b.or(a)?

mohaque0 avatar May 23 '25 22:05 mohaque0

The problem you have here is that your input is not just a product: 5 is, by itself, a perfectly valid unary too. So the parser sees that, decides it's found the correct solution, and then continues onward. Usually when parsing operators like this, you nest them together according to precedence (or use the pratt combinator if you want an easier time) by having your product parser look for unary as its operands rather than having a big or chain at the bottom with ambiguous precedence.

zesterer avatar May 24 '25 08:05 zesterer

Try this:

    #[derive(Debug, Clone)]
    pub enum Val {
        Integer(i64),
    }
    
    #[derive(Debug, Clone)]
    pub enum Exp {
        Val(Val),
    
        Neg(Box<Exp>),
        Mul(Box<Exp>, Box<Exp>),
        Div(Box<Exp>, Box<Exp>),
    }

    fn parse_order_bug_parser<'a>() -> impl Parser<'a, &'a str, Exp> {
        let expr = recursive(|expr| {
            let int = text::int(10)
                .map(|s: &str| Exp::Val(Val::Integer(s.parse().unwrap())))
                .padded();

            let atom = 
                    int
                    .or(expr.clone().delimited_by(just('('), just(')')));

            let op = |c| just(c).padded();

            let unary = op('-')
                .repeated()
                .foldr(atom.clone(), |_op, rhs| Exp::Neg(Box::new(rhs)));

            let product = unary.clone()
                .foldl(
                    op('*').to(Exp::Mul as fn(_, _) -> _)
                        .or(op('/').to(Exp::Div as fn(_, _) -> _))
                        .then(unary.clone()).repeated(),
                    |lhs, (op, rhs)| op(Box::new(lhs), Box::new(rhs))
                );

            product
        });

        expr
    }

    #[test]
    fn parse_order_bug() {
        let result = parse_order_bug_parser().parse("5 * (----5)");

        println!("{:?}", result);
        result.unwrap();
    }

zesterer avatar May 24 '25 08:05 zesterer

I see what you're saying. I guess I misunderstood the docs. I'm referring to the docs here where it says Parse one thing or, on failure, another thing.

So what I thought that meant is that if the order is unary.or(product) the parser will first attempt unary. It will see 5 and then as you say "decides it's found the correct solution, and then continues onward." It will now see spurious characters and then fail. The or(product) part will then get evaluated and succeed.

So I guess it is more correct to understand that the or() is treating two kinds of failures differently:

  1. If the failure is a character that doesn't match the sub-parser (unary in this case) then the lhs will fail and trigger the rhs.
  2. If the failure is from spurious characters at the end of the input then the lhs will succeed, but the larger .or() clause will fail.

Is that right?

mohaque0 avatar May 25 '25 22:05 mohaque0

Parsers are lazy when evaluating sections of input. unary will succeed on 5, and then it'll pass control on to whatever follows. Unfortunately, in your original parser, nothing follows unary - so the parser will generate an error when it doesn't find the end of input.

zesterer avatar May 25 '25 22:05 zesterer