cats-parse
cats-parse copied to clipboard
Why is Parser0#repSep not possible?
Hi all,
I'm trying to port scala-uri to cats-parse. One difficulty I'm facing is with parsing path parts including empty ones:
a/b/c
/a/b/c
a//c
/a//c
What I would like to do is something like this (simplified)
def _path_segment: Parser0[String] = Parser.until0(charIn("/?#"))
def _path: Parser[String] = (Parser.char('/').? ~ _path_segment.repSep0(char('/')).string
but this doesn't work, as Parser0
doesn't have repSep
or repSep0
defined.
I understand that Parser0#rep
is problematic, as one could easily run the empty parser infinitely, but in my naive thinking this problem shouldn't exist with repSep
, right?
Or is there a nicer way to do this in cats-parse?
A combinator to repeat a Parser0
with a separator that's a Parser
doesn't exist in the library, but looks safe to me. They could look like
def rep0sep0[A](data: Parser0[A], separator: Parser[Any]): Parser0[List[A]] =
(data.? ~ (separator *> data).rep0).map { case (a, as) => a ++: as }
def rep0sep[A](data: Parser0[A], separator: Parser[Any]): Parser0[NonEmptyList[A]] = (data ~ (separator *> data).rep0).map {
case (a, as) => NonEmptyList(a, as)
}
Then your use case becomes
val sep = char('/')
val part = until0(charIn("/?#"))
val path = sep.? *> rep0sep0(part, sep)
val _path = path.string
We could add a function like this. I'd be happy to do that.
By the way, I think a totally empty path is not a valid path is it?
So, even this function isn't great since it will match nothing.
It seems like a better way to write this parser would be to consider a few cases:
- nonEmpty string with a tail that has /a/b/c... where the a, b, c can be empty
- / followed by a/b/c... where a, b, c can be empty
In this way, you still get a Parser[Path]
and not a Parser0
.
I thought about adding this code... but then I realized that we probably want to be a bit more thoughtful since we have min and max accepting versions of repSep, we probably want to mirror that.
But then, if you repeat 2 or more times with a sep that is non-empty you get a Parser
not a Parser0
, so maybe you want to retain that, and have require min >= 2, then you have three cases: 0 items (Parser0), 1 item (Parser0), 2 items (has 1 sep, so it is a Parser).
In this way, you have (2 or more) | onetime.map(_ :: Nil) | pure(Nil)
which would naturally degrade back down to Parser0 by the last two ors.
But then, if you repeat 2 or more times with a sep that is non-empty you get a
Parser
not aParser0
, so maybe you want to retain that, and have require min >= 2, then you have three cases: 0 items (Parser0), 1 item (Parser0), 2 items (has 1 sep, so it is a Parser).
I have some difficulties understanding why we need Parser0
and Parser
actually. What is the shortcoming to merge this two thing together?
Unbounded repetition on a Parser0 isn't safe: it can just run forever allocating more memory parsing an infinite number of items making no progress.
This is not an uncommon error when working with Parser combinators.
The blog post linked in the read me of this repo covers this:
https://posco.medium.com/designing-a-parsing-library-in-scala-d5076de52536