nom icon indicating copy to clipboard operation
nom copied to clipboard

Help needed with writing ambiguous parser

Open gyzerok opened this issue 2 years ago • 1 comments

Hello everyone!

Unfortunately I didn't find help elsewhere. Hopefully it's ok to ask in the issues. My apologies if not, I will close it.

It feels like there is something in nom that I can't grasp when it comes to writing ambiguous parsers. I am having similar problems in different places, but to make it simpler let's look at one particular example. However if you can give me general guidance on how to approach such problems it'll be greatly appreciated.

In the code example below I am trying to parse URL-like string such as reddit.com or api.reddit.com. However this code won't pass the following test:

assert_eq!(regname("reddit.com."), Ok((".", "reddit.com")))

As I understand it, since domain function expects things to be terminated with ., my input gets recognized as 2 domains without tld (instead of domain + tld) and thus gets ignored.

How can I make it work properly?

fn regname(i: &str) -> IResult<&str, &str> {
    context("regname", recognize(pair(many1(domain), tld)))(i)
}

fn tld(i: &str) -> IResult<&str, &str> {
    context(
        "tld",
        verify(
            recognize(many1(
                alpha1,
            )),
            // This predicate does contain "com", other tests
            // without dot in the end pass do pass
            is_known_tld,
        ),
    )(i)
}

fn domain(i: &str) -> IResult<&str, &str> {
    context(
        "domain",
        recognize(terminated(
            many1(alt((
                alphanumeric1,
                tag("-"),
            ))),
            tag("."),
        )),
    )(i)
}

Thank you!

gyzerok avatar May 24 '23 05:05 gyzerok

reddit. is consumed by domain. com. is also consumed by domain. reddit.com. is consumed by many1(domain).

pair(many1(domain),tld) = pair( ("","reddit.com."), Err) => Error

the remaining input is "" and since tld cannot parse "", error is returned by tld, and pair likewise:

    context("regname", recognize(pair(many1(domain), tld)))(i)

arashatt avatar Dec 08 '24 23:12 arashatt