parsecat icon indicating copy to clipboard operation
parsecat copied to clipboard

Pure functional parser combinator library which supports both applicative and monadic styles of parsing.

Parsecat

Build Status Coverage Status

Parsecat is a lightweight, pure-functional parser monad transformer and combinator library, which supports both applicative and monadic styles of parsing.

Usage

Supports Scala 2.12 and 2.13.

libraryDependencies += "com.github.izeigerman" %% "parsecat-core" % "0.3.0"
// to include JSON parsers
libraryDependencies += "com.github.izeigerman" %% "parsecat-json" % "0.3.0"

Text parser

The text parser is just a type alias for a specialized ParsetT:

type TextParser[A] = ParserT[Id, PagedStream[Char], TextParserContext, TextPosition, A]

It supports a 2-dimensional position tracking and has a variety of string and character implementations.

Character and String parsers

Before starting to use the parser the following imports are required:

scala> import cats.implicits._
import cats.implicits._

scala> import parsecat.parsers.string._
import parsecat.parsers.string._

Let's parse the following string:

scala> val str = "Hello World"
str: String = Hello World

A parser for this string can be defined in a couple of ways. I.e.:

scala> val parserA = andThen(string("Hello") <* space, string("World"))
parserA: parsecat.ParserT[cats.Id,parsecat.stream.PagedStream[Char],parsecat.parsers.TextParserContext,parsecat.parsers.TextPosition,(String, String)] = parsecat.ParserT@6f92758f

The parser above relies on applicative properties of ParserT and uses the <* operator to parse both: string "Hello" and a whitespace, but then keeps only a result from the first parser. andThen - is one of many combinator functions that are included into this library. The full list can be found here. The result of applying of this parser to the test string is the following:

scala> parseText(parserA, str)
res1: Either[parsecat.ParseError[parsecat.parsers.TextPosition],(String, String)] = Right((Hello,World))

There is also a different style of combining parsers together - a monadic one. The exactly same parser can be defined differently:

scala> val parserM = andThen(string("Hello"), space >> string("World"))
parserM: parsecat.ParserT[cats.Id,parsecat.stream.PagedStream[Char],parsecat.parsers.TextParserContext,parsecat.parsers.TextPosition,(String, String)] = parsecat.ParserT@7daf7131

This time we rely on monadic properties of ParserT and use the >> operator - a special binding operator which discards the result of the first action. The result produced by this parser is completely the same:

scala> parseText(parserM, str)
res2: Either[parsecat.ParseError[parsecat.parsers.TextPosition],(String, String)] = Right((Hello,World))

When the parsing is unsuccessful the error will contain a very detailed information about what went wrong:

scala> import parsecat.parsers.regex._
import parsecat.parsers.regex._

scala> val str = """
     | Hello
     | World
     | """.stripMargin
str: String =
"
Hello
World
"

scala> val parser = andThen(eol >> string("Hello"), eol >> regex("o.+d".r))
parser: parsecat.ParserT[cats.Id,parsecat.stream.PagedStream[Char],parsecat.parsers.TextParserContext,parsecat.parsers.TextPosition,(String, String)] = parsecat.ParserT@789c90f4

scala> parseText(parser, str)
res4: Either[parsecat.ParseError[parsecat.parsers.TextPosition],(String, String)] = Left(parsecat.ParseError: [Parsecat] (row 3, column 1): input doesn't match regex 'o.+d')

Here are some other parser application examples:

scala> parseText(many(char('a')), "aabb")
res5: Either[parsecat.ParseError[parsecat.parsers.TextPosition],List[Char]] = Right(List(a, a))
scala> parseText(manyTill(anyChar, char('b')), "aaab")
res6: Either[parsecat.ParseError[parsecat.parsers.TextPosition],List[Char]] = Right(List(a, a, a))

Note: in examples above we used many and manyTill combinators. Although this approach looks appealing, it causes creation of potentially big number of monadic bindings at runtime. This may lead to a considerable performance degradation. Use these combinators carefully and consider using the string-specific alternatives (satisfyMany, anyCharTill, oneOfMany, etc.) instead. The same is true for combinators many1, skipMany and skipMany1.

scala> parseText(char('a') <+> char('b'), "baba")
res7: Either[parsecat.ParseError[parsecat.parsers.TextPosition],Char] = Right(b)

Note: in the last example we used the <+> operator. This is a monoid associative operator or a sum (contrary to a product provided by the applicative functor). It first applies a parser to its left and if the parsing is unsuccessful, the parser on the right side of the expression will be applied instead. The same can be expressed with a help of the orElse combinator:

scala> parseText(orElse(char('a'), char('b')), "baba")
res9: Either[parsecat.ParseError[parsecat.parsers.TextPosition],Char] = Right(b)

Numeric parsers

Numeric parsers - are extension to character parsers, which introduce parsing support for numeric literals. The following import is required:

scala> import parsecat.parsers.numeric._
import parsecat.parsers.numeric._

Examples:

scala> parseText(integer, "1234567")
res10: Either[parsecat.ParseError[parsecat.parsers.TextPosition],Int] = Right(1234567)


scala> parseText(double, "1.2345E67")
res11: Either[parsecat.ParseError[parsecat.parsers.TextPosition],Double] = Right(1.2345E67)

All common numeric types are supported: byte, short, int, long, float, double, bigInt and bigDecimal.

JSON parser

The JSON parser was added as a reference implementation and a good example of expressive power of parser combinators. The entire implementation is less than 60 LOC and could've been a part of this README, but instead you may find it here: https://github.com/izeigerman/parsecat/blob/master/json/src/main/scala/parsecat/parsers/json/JsonParsers.scala.

scala> :paste
// Entering paste mode (ctrl-D to finish)

import parsecat.parsers.json._

val jsonStr =
  """{
    |  "field1": "test",
    |  "field2": [
    |    1, 2, 3
    |  ],
    |  "field3": {
    |    "field4": true,
    |    "field5": null,
    |    "field6": [
    |      { "field7": 1.234 }, { "field8": false }
    |    ]
    |  }
    |}""".stripMargin


parseJson(jsonStr)


// Exiting paste mode, now interpreting.


res12: Either[parsecat.ParseError[parsecat.parsers.TextPosition],parsecat.parsers.json.JsValue] = Right(JsObject(Map(field1 -> JsString(test), field2 -> JsArray(List(JsInt(1), JsInt(2), JsInt(3))), field3 -> JsObject(Map(field4 -> JsBoolean(true), field5 -> JsNull, field6 -> JsArray(List(JsObject(Map(field7 -> JsDouble(1.234))), JsObject(Map(field8 -> JsBoolean(false))))))))))

NOTE: this parser was created as an example and a reference implementation and should never be used in a real project. Although the general parsing performance has been significantly improved in the version 0.2.0, it still can't compete with any modern hand-written JSON parser out there. Its performance is the same as scala-parser-combinators version of JSON parser, which is deprecated by now.