nom parsing number with an "E" fails if it's not followed by another number

trafficstars

Rust version : rustc 1.37.0 (eae3437df 2019-08-13)
nom version : 5.0.0
nom compilation features used: either default, or with default feature disabled

for context, I'm trying to parse a number with a unit that can be "EB" for exabyte.

Test case

with default features:

fn main() {
    use nom::number::complete::float;

    let parser = |s| float::<_, ()>(s);
    assert_eq!(parser("1B"), Ok(("B", 1.0)));
    assert_eq!(parser("1E"), Ok(("E", 1.0)));

    // expected
    assert_eq!(parser("1EB"), Ok(("EB", 1.0)));
    // actually
    assert_eq!(parser("1EB"), Ok(("B", 1.0)));
}

with default features disabled:

fn main() {
    use nom::number::complete::float;
    use nom::Err;

    let parser = |s| float::<_, ()>(s);
    assert_eq!(parser("1B"), Ok(("B", 1.0)));

    // expected
    assert_eq!(parser("1E"), Ok(("E", 1.0)));
    // actually
    assert_eq!(parser("1E"), Err(Err::Failure(())));

    // expected
    assert_eq!(parser("1EB"), Ok(("EB", 1.0)));
    // actually
    assert_eq!(parser("1EB"), Err(Err::Failure(())));
}

Aug 17 '19 13:08 mockersf

a E following a number can be part of the syntax for a number, it's the exponential notation, example here: '1E2'.

If you do not need floating point numbers you could combine the digit parser and the parse_to combinator.

Aug 22 '19 08:08 Geal

Sadly I do need floating point numbers...

I tried following the code, seems hard to fix, either going to crate lexical-core (which seems not easy to understand) or to recognize_float (which would be made more complex)

I had something working with nom 4:

named!(
    parse_float<types::CompleteStr, f64>,
    complete!(flat_map!(recognize_float, parse_to!(f64)))
);

Aug 26 '19 20:08 mockersf

I'm kind of a beginner so I don't dare to create a PR, but the current version also recognize as numbers "." and "E", so I managed to cook this merging recognize_float (removing a cut) and double:

fn parse_float<'a, E: ParseError<&'a str>>(input: &'a str) -> IResult<&'a str, f64, E> {
    match recognize(tuple((
        opt(alt((char('+'), char('-')))),
        alt((
            map(tuple((digit1, opt(pair(char('.'), opt(digit1))))), |_| ()),
            map(tuple((char('.'), digit1)), |_| ()),
        )),
        opt(tuple((
            alt((char('e'), char('E'))),
            opt(alt((char('+'), char('-')))),
            digit1,
        ))),
    )))(input)
    {
        Err(e) => Err(e),
        Ok((i, s)) => match s.parse::<f64>() {
            Ok(n) => Ok((i, n)),
            Err(_) => Err(Err::Error(E::from_error_kind(i, ErrorKind::Float))),
        },
    }
}

Hope it helps!

Oct 20 '19 23:10 AndresParraSilva

This cut inside recognize_float is breaking my parsers too:

use nom::branch::alt;
use nom::number::complete::recognize_float;
use nom::bytes::complete::tag;
use nom::error::ErrorKind;

fn main() {
    let result = alt((
            recognize_float::<_, (_, ErrorKind)>,
            tag("1esomething"),
    ))("1esomething");

    println!("result = {:#?}", result);
}

gives an Err::Failure rather than an Ok.

Jan 05 '20 10:01 YaLTeR

Is this cut() in number/complete.rs:1417 actually even needed?

Sep 05 '22 23:09 TobTobXX

Turns out, nope. At least that what the test suite and my own testing said...

Sep 05 '22 23:09 TobTobXX

the cut is necessary, because if we're expecting a float, and see some numbers then an E, there must be a valid exponent after that. Removing the cut is bound to breaking a lot of existing parsers. If you do not want that behaviour, it is easy to rewrite a float parser without that cut

Dec 30 '22 16:12 Geal

But wouldn't it make sense to end the float before the E in a case where no valid exponent follows?

eg.:

1234e4EB -> float: 1234e4 | other: "EB"
1234EB   -> float: 1234   | other: "EB"

The case that breaks is when you have an "invalid" exponent, which is basically the same as no exponent, as any integer is a valid exponent.

eg. (it is very hard to come up with an invalid exponent. I just chose "x"):

old: 1234ex -> error
new: 1234ex -> float: 1234 | other "ex"

Removing the cut is bound to breaking a lot of existing parsers.

I can only come up with one example where this behavior would be useful: When validating that an expression ends with a float. And when you want to do that, you'd probably also use complete(), no?

I'm having a really hard time understanding in which situation this behavior is desired.

If you do not want that behaviour, it is easy to rewrite a float parser without that cut

Yes, I did. But it took me ~2 hours to find out why the function returned an error, then why it was acting "weirdly" (turns out this was intended, but not documented) and finally what to change to make it work. Now, yes, you could make it work in less than 10min, but I'm only using nom, I'm not intimately familiar with it.

If I can get you to rethink this, that'd be great, but if you don't want to change your mind anyway… well then. It's your project. Thank you for your work nonetheless.

Dec 30 '22 17:12 TobTobXX

parsers have two functions:

recognizing valid inputs and transforming them to useful data
rejecting invalid inputs

That second point is often overlooked because when developing, we're too often only working on valid inputs, but it is crucial. At the first sign of invalid input, the parser should fail loudly, instead of trying to keep partial data, because that's how you risk working with bad or malicious data. I also recognize that choices like this one in the float parser do not fit every use case, and that's why nom is designed to let people write their own parsers and combinators to extend or replace basic behaviour. So here I believe the better path for you is to write your own float parser.

Honestly, right now I don't even think the float parser should be provided by nom, because every format or language has its own opinion on how floats should be parsed so it's unlikely to be used in the end. It's just convenient to have it to get started

Dec 30 '22 17:12 Geal

nom nom copied to clipboard

parsing number with an "E" fails if it's not followed by another number

Test case

nom
nom copied to clipboard