nom icon indicating copy to clipboard operation
nom copied to clipboard

`recognize(multispace0)` broken if `multispace0` consumes all

Open chrjabs opened this issue 11 months ago • 2 comments

In updating one of my projects to nom 8, I noticed one of my tests started failing. I narrowed it down to the combination of nom::combinator::recognize and nom::character::complete::multispace0 being broken if multispace0 consumes the entire input.

In more detail, when adding the following test to src/combinator/tests.rs, cases A-C pass, but case D fails. This seems to have worked correctly in nom 7.

#[test]
fn recognize_issue() {
  use crate::character::complete::{multispace0, multispace1};

  let input = "\na";
  // Case A
  assert_eq!(
    recognize::<_, crate::error::Error<_>, _>(multispace1).parse(input),
    Ok(("a", "\n"))
  );
  // Case B
  assert_eq!(
    recognize::<_, crate::error::Error<_>, _>(multispace0).parse(input),
    Ok(("a", "\n"))
  );

  let input = "\n";
  // Case C
  assert_eq!(
    recognize::<_, crate::error::Error<_>, _>(multispace1).parse(input),
    Ok(("", "\n"))
  );
  // Case D
  assert_eq!(
    recognize::<_, crate::error::Error<_>, _>(multispace0).parse(input),
    Ok(("", "\n"))
  );
}

chrjabs avatar Jan 29 '25 13:01 chrjabs

Thanks for the test, this was a nice project for me to get used to debugging Rust. I've found the problem and opened a PR to fix it: https://github.com/rust-bakery/nom/pull/1811

marcdejonge avatar Jan 30 '25 15:01 marcdejonge

I have run into a similar bug in 8.0.0 when using recognize and consumed with a digit0 parser that consumes all:

use nom::{Parser, character::complete::digit0, combinator::recognize};

#[test]
fn digit_parser() {
    let mut p = recognize(digit0::<&str, ()>);

    // If `digit0` doesn't consume the entire input then everything works as expected
    let s = "1234A";
    let (rem, res) = p.parse(s).unwrap();
    assert_eq!(rem, "A");
    assert_eq!(res, "1234");

    // But if `digit0` does consume the entire input then the result is incorrect
    let s = "1234";
    let (rem, res) = p.parse(s).unwrap();
    assert_eq!(rem, "");
    assert_eq!(res, "1234");
}

That final assert_eq! panics with:

assertion `left == right` failed
  left: ""
 right: "1234"

mbbutler avatar Apr 05 '25 00:04 mbbutler

I ran into the same exact issue as @mbbutler where digit0 being used within recognize caused input to be skipped while not being part of the segment returned by recognize:

use nom::Parser;
use nom::character::complete::{char, digit0, digit1};
use nom::combinator::recognize;
use nom::error::ParseError;

#[test]
fn test_recognize_and_digit0() {
    fn p<'a, E: ParseError<&'a str>>() -> impl Parser<&'a str, Output = &'a str, Error = E> {
        recognize((digit1, char('.'), digit0))
    }

    let input = "123.456";
    let (rem, res) = p::<nom::error::Error<_>>().parse(input).unwrap();
    assert_eq!(rem, "");
    assert_eq!(res, "123.456");
}
assertion `left == right` failed
  left: "123."
 right: "123.456"

It seems surprising to me that the end of the input ("456") does not show up in either part of recognize's return value of the remaining input and recognized input, so I think there must be a bug in nom.

As a workaround, I notice I can get the results I want if I use nom::bytes::complete::take_while(|c: char| c.is_ascii_digit()) instead of digit0.

I also notice that digit1 does not suffer from this problem if I use it in place of digit0 here.

Macil avatar Aug 25 '25 07:08 Macil

I've made a workaround to use until the fix in https://github.com/rust-bakery/nom/pull/1811 gets merged:

use nom::{Input, Offset, OutputMode, PResult, Parser, error::ParseError};

/// [`nom::combinator::recognize`] but with a workaround for
/// https://github.com/rust-bakery/nom/issues/1808
pub fn recognize<I: Clone + Offset + Input, E: ParseError<I>, F>(
    parser: F,
) -> impl Parser<I, Output = I, Error = E>
where
    F: Parser<I, Error = E>,
{
    nom::combinator::recognize(fix_parser_issue_1808(parser))
}

/// Fixes any parser that mistakenly returns an empty remaining input reference
/// not from the end of the input.
fn fix_parser_issue_1808<I, O, E, F>(f: F) -> impl Parser<I, Output = O, Error = E>
where
    I: Input + Clone,
    E: ParseError<I>,
    F: Parser<I, Output = O, Error = E>,
{
    Issue1808Fixer { parser: f }
}

/// Parser implementation for [fix_parser_issue_1808]
struct Issue1808Fixer<F> {
    parser: F,
}

impl<I, F> Parser<I> for Issue1808Fixer<F>
where
    I: Input + Clone,
    F: Parser<I>,
{
    type Output = F::Output;
    type Error = F::Error;

    #[inline(always)]
    fn process<OM: OutputMode>(&mut self, input: I) -> PResult<OM, I, Self::Output, Self::Error> {
        let i = input.clone();
        match self.parser.process::<OM>(i) {
            Ok((i, result)) => {
                if i.input_len() == 0 {
                    return Ok((input.take_from(input.input_len()), result));
                }
                Ok((i, result))
            }
            Err(e) => Err(e),
        }
    }
}

#[cfg(test)]
mod test {
    use super::*;
    use nom::character::complete::digit0;

    #[test]
    fn test_recognize_and_digit0() {
        let input = "12345";

        let (rem, res) = recognize(digit0::<&str, ()>).parse(input).unwrap();

        assert_eq!(rem, "");
        assert_eq!(res, "12345");
    }
}

Macil avatar Aug 25 '25 09:08 Macil