`recognize(multispace0)` broken if `multispace0` consumes all
In updating one of my projects to nom 8, I noticed one of my tests started failing. I narrowed it down to the combination of nom::combinator::recognize and nom::character::complete::multispace0 being broken if multispace0 consumes the entire input.
In more detail, when adding the following test to src/combinator/tests.rs, cases A-C pass, but case D fails. This seems to have worked correctly in nom 7.
#[test]
fn recognize_issue() {
use crate::character::complete::{multispace0, multispace1};
let input = "\na";
// Case A
assert_eq!(
recognize::<_, crate::error::Error<_>, _>(multispace1).parse(input),
Ok(("a", "\n"))
);
// Case B
assert_eq!(
recognize::<_, crate::error::Error<_>, _>(multispace0).parse(input),
Ok(("a", "\n"))
);
let input = "\n";
// Case C
assert_eq!(
recognize::<_, crate::error::Error<_>, _>(multispace1).parse(input),
Ok(("", "\n"))
);
// Case D
assert_eq!(
recognize::<_, crate::error::Error<_>, _>(multispace0).parse(input),
Ok(("", "\n"))
);
}
Thanks for the test, this was a nice project for me to get used to debugging Rust. I've found the problem and opened a PR to fix it: https://github.com/rust-bakery/nom/pull/1811
I have run into a similar bug in 8.0.0 when using recognize and consumed with a digit0 parser that consumes all:
use nom::{Parser, character::complete::digit0, combinator::recognize};
#[test]
fn digit_parser() {
let mut p = recognize(digit0::<&str, ()>);
// If `digit0` doesn't consume the entire input then everything works as expected
let s = "1234A";
let (rem, res) = p.parse(s).unwrap();
assert_eq!(rem, "A");
assert_eq!(res, "1234");
// But if `digit0` does consume the entire input then the result is incorrect
let s = "1234";
let (rem, res) = p.parse(s).unwrap();
assert_eq!(rem, "");
assert_eq!(res, "1234");
}
That final assert_eq! panics with:
assertion `left == right` failed
left: ""
right: "1234"
I ran into the same exact issue as @mbbutler where digit0 being used within recognize caused input to be skipped while not being part of the segment returned by recognize:
use nom::Parser;
use nom::character::complete::{char, digit0, digit1};
use nom::combinator::recognize;
use nom::error::ParseError;
#[test]
fn test_recognize_and_digit0() {
fn p<'a, E: ParseError<&'a str>>() -> impl Parser<&'a str, Output = &'a str, Error = E> {
recognize((digit1, char('.'), digit0))
}
let input = "123.456";
let (rem, res) = p::<nom::error::Error<_>>().parse(input).unwrap();
assert_eq!(rem, "");
assert_eq!(res, "123.456");
}
assertion `left == right` failed
left: "123."
right: "123.456"
It seems surprising to me that the end of the input ("456") does not show up in either part of recognize's return value of the remaining input and recognized input, so I think there must be a bug in nom.
As a workaround, I notice I can get the results I want if I use nom::bytes::complete::take_while(|c: char| c.is_ascii_digit()) instead of digit0.
I also notice that digit1 does not suffer from this problem if I use it in place of digit0 here.
I've made a workaround to use until the fix in https://github.com/rust-bakery/nom/pull/1811 gets merged:
use nom::{Input, Offset, OutputMode, PResult, Parser, error::ParseError};
/// [`nom::combinator::recognize`] but with a workaround for
/// https://github.com/rust-bakery/nom/issues/1808
pub fn recognize<I: Clone + Offset + Input, E: ParseError<I>, F>(
parser: F,
) -> impl Parser<I, Output = I, Error = E>
where
F: Parser<I, Error = E>,
{
nom::combinator::recognize(fix_parser_issue_1808(parser))
}
/// Fixes any parser that mistakenly returns an empty remaining input reference
/// not from the end of the input.
fn fix_parser_issue_1808<I, O, E, F>(f: F) -> impl Parser<I, Output = O, Error = E>
where
I: Input + Clone,
E: ParseError<I>,
F: Parser<I, Output = O, Error = E>,
{
Issue1808Fixer { parser: f }
}
/// Parser implementation for [fix_parser_issue_1808]
struct Issue1808Fixer<F> {
parser: F,
}
impl<I, F> Parser<I> for Issue1808Fixer<F>
where
I: Input + Clone,
F: Parser<I>,
{
type Output = F::Output;
type Error = F::Error;
#[inline(always)]
fn process<OM: OutputMode>(&mut self, input: I) -> PResult<OM, I, Self::Output, Self::Error> {
let i = input.clone();
match self.parser.process::<OM>(i) {
Ok((i, result)) => {
if i.input_len() == 0 {
return Ok((input.take_from(input.input_len()), result));
}
Ok((i, result))
}
Err(e) => Err(e),
}
}
}
#[cfg(test)]
mod test {
use super::*;
use nom::character::complete::digit0;
#[test]
fn test_recognize_and_digit0() {
let input = "12345";
let (rem, res) = recognize(digit0::<&str, ()>).parse(input).unwrap();
assert_eq!(rem, "");
assert_eq!(res, "12345");
}
}