cats-parse icon indicating copy to clipboard operation
cats-parse copied to clipboard

Why is Parser0#repSep not possible?

Open nightscape opened this issue 3 years ago • 6 comments

Hi all,

I'm trying to port scala-uri to cats-parse. One difficulty I'm facing is with parsing path parts including empty ones:

a/b/c
/a/b/c
a//c
/a//c

What I would like to do is something like this (simplified)

  def _path_segment: Parser0[String] = Parser.until0(charIn("/?#"))
  def _path: Parser[String] = (Parser.char('/').? ~ _path_segment.repSep0(char('/')).string

but this doesn't work, as Parser0 doesn't have repSep or repSep0 defined. I understand that Parser0#rep is problematic, as one could easily run the empty parser infinitely, but in my naive thinking this problem shouldn't exist with repSep, right?

Or is there a nicer way to do this in cats-parse?

nightscape avatar Apr 25 '21 07:04 nightscape

A combinator to repeat a Parser0 with a separator that's a Parser doesn't exist in the library, but looks safe to me. They could look like

def rep0sep0[A](data: Parser0[A], separator: Parser[Any]): Parser0[List[A]] =
  (data.? ~ (separator *> data).rep0).map { case (a, as) => a ++: as }

def rep0sep[A](data: Parser0[A], separator: Parser[Any]): Parser0[NonEmptyList[A]] = (data ~ (separator *> data).rep0).map {
  case (a, as) => NonEmptyList(a, as)
}

Then your use case becomes

val sep = char('/')
val part = until0(charIn("/?#"))
val path = sep.? *> rep0sep0(part, sep)
val _path = path.string

martijnhoekstra avatar Apr 29 '21 09:04 martijnhoekstra

We could add a function like this. I'd be happy to do that.

johnynek avatar Apr 29 '21 17:04 johnynek

By the way, I think a totally empty path is not a valid path is it?

So, even this function isn't great since it will match nothing.

It seems like a better way to write this parser would be to consider a few cases:

  1. nonEmpty string with a tail that has /a/b/c... where the a, b, c can be empty
  2. / followed by a/b/c... where a, b, c can be empty

In this way, you still get a Parser[Path] and not a Parser0.

johnynek avatar May 06 '21 01:05 johnynek

I thought about adding this code... but then I realized that we probably want to be a bit more thoughtful since we have min and max accepting versions of repSep, we probably want to mirror that.

But then, if you repeat 2 or more times with a sep that is non-empty you get a Parser not a Parser0, so maybe you want to retain that, and have require min >= 2, then you have three cases: 0 items (Parser0), 1 item (Parser0), 2 items (has 1 sep, so it is a Parser).

In this way, you have (2 or more) | onetime.map(_ :: Nil) | pure(Nil) which would naturally degrade back down to Parser0 by the last two ors.

johnynek avatar May 06 '21 02:05 johnynek

But then, if you repeat 2 or more times with a sep that is non-empty you get a Parser not a Parser0, so maybe you want to retain that, and have require min >= 2, then you have three cases: 0 items (Parser0), 1 item (Parser0), 2 items (has 1 sep, so it is a Parser).

I have some difficulties understanding why we need Parser0 and Parser actually. What is the shortcoming to merge this two thing together?

zsluedem avatar Oct 09 '21 02:10 zsluedem

Unbounded repetition on a Parser0 isn't safe: it can just run forever allocating more memory parsing an infinite number of items making no progress.

This is not an uncommon error when working with Parser combinators.

The blog post linked in the read me of this repo covers this:

https://posco.medium.com/designing-a-parsing-library-in-scala-d5076de52536

johnynek avatar Oct 09 '21 06:10 johnynek