nom
nom copied to clipboard
nom_recipes.md discussion of Rust identifier example is not quite right
(Apologies for the terse issue, and not just filing a pull request, but I have to log off momentarily.)
Here in nom_recipes.md:
https://github.com/Geal/nom/blob/294ffb3d9e0ade2c3b7ddfff52484b6d643dcce1/doc/nom_recipes.md?plain=1#L135
The documentation says:
The first alt parser would recognize h. ...
But I think that isn't quite correct. The first alt parser (alpha1) consumes as many alpha-only characters as it can, I think: it consumes hello_world. Then the second parser consumes 123abc.
The OP is mostly correct. I think it's actually just a simple typo. If one were to replace the alpha1 parser with a take(1), the documentation would make complete sense.
Here is an example and potential solution:
use nom::{
IResult,
branch::alt,
multi::many0_count,
combinator::recognize,
sequence::pair,
character::complete::{alpha1, alphanumeric1},
bytes::complete::tag,
};
pub fn identifier(input: &str) -> IResult<&str, &str> {
recognize(pair(
alt((take(1usize), tag("_"))),
many0_count(alt((alphanumeric1, tag("_")))),
))(input)
}
The other alternative is to rewrite the documentation comments. This change would likely cause the least amount of friction.
Wouldn't the take accept any character, while we want to limit it to accepting only a-zA-Z_ for the first character?
Yes, it would accept any character. I proposed this change because it makes the code example consistent with the wording in the existing documentation.
I believe the docs contain a simple typo where alpha1 is being treated as of it simple consumes 1 alphabetic token.