parsecat
parsecat copied to clipboard
Pure functional parser combinator library which supports both applicative and monadic styles of parsing.
Parsecat
Parsecat is a lightweight, pure-functional parser monad transformer and combinator library, which supports both applicative and monadic styles of parsing.
Usage
Supports Scala 2.12 and 2.13.
libraryDependencies += "com.github.izeigerman" %% "parsecat-core" % "0.3.0"
// to include JSON parsers
libraryDependencies += "com.github.izeigerman" %% "parsecat-json" % "0.3.0"
Text parser
The text parser is just a type alias for a specialized ParsetT
:
type TextParser[A] = ParserT[Id, PagedStream[Char], TextParserContext, TextPosition, A]
It supports a 2-dimensional position tracking and has a variety of string and character implementations.
Character and String parsers
Before starting to use the parser the following imports are required:
scala> import cats.implicits._
import cats.implicits._
scala> import parsecat.parsers.string._
import parsecat.parsers.string._
Let's parse the following string:
scala> val str = "Hello World"
str: String = Hello World
A parser for this string can be defined in a couple of ways. I.e.:
scala> val parserA = andThen(string("Hello") <* space, string("World"))
parserA: parsecat.ParserT[cats.Id,parsecat.stream.PagedStream[Char],parsecat.parsers.TextParserContext,parsecat.parsers.TextPosition,(String, String)] = parsecat.ParserT@6f92758f
The parser above relies on applicative properties of ParserT
and uses the <*
operator to parse both: string "Hello" and a whitespace, but then keeps only a result from the first parser. andThen
- is one of many combinator functions that are included into this library. The full list can be found here. The result of applying of this parser to the test string is the following:
scala> parseText(parserA, str)
res1: Either[parsecat.ParseError[parsecat.parsers.TextPosition],(String, String)] = Right((Hello,World))
There is also a different style of combining parsers together - a monadic one. The exactly same parser can be defined differently:
scala> val parserM = andThen(string("Hello"), space >> string("World"))
parserM: parsecat.ParserT[cats.Id,parsecat.stream.PagedStream[Char],parsecat.parsers.TextParserContext,parsecat.parsers.TextPosition,(String, String)] = parsecat.ParserT@7daf7131
This time we rely on monadic properties of ParserT
and use the >>
operator - a special binding operator which discards the result of the first action. The result produced by this parser is completely the same:
scala> parseText(parserM, str)
res2: Either[parsecat.ParseError[parsecat.parsers.TextPosition],(String, String)] = Right((Hello,World))
When the parsing is unsuccessful the error will contain a very detailed information about what went wrong:
scala> import parsecat.parsers.regex._
import parsecat.parsers.regex._
scala> val str = """
| Hello
| World
| """.stripMargin
str: String =
"
Hello
World
"
scala> val parser = andThen(eol >> string("Hello"), eol >> regex("o.+d".r))
parser: parsecat.ParserT[cats.Id,parsecat.stream.PagedStream[Char],parsecat.parsers.TextParserContext,parsecat.parsers.TextPosition,(String, String)] = parsecat.ParserT@789c90f4
scala> parseText(parser, str)
res4: Either[parsecat.ParseError[parsecat.parsers.TextPosition],(String, String)] = Left(parsecat.ParseError: [Parsecat] (row 3, column 1): input doesn't match regex 'o.+d')
Here are some other parser application examples:
scala> parseText(many(char('a')), "aabb")
res5: Either[parsecat.ParseError[parsecat.parsers.TextPosition],List[Char]] = Right(List(a, a))
scala> parseText(manyTill(anyChar, char('b')), "aaab")
res6: Either[parsecat.ParseError[parsecat.parsers.TextPosition],List[Char]] = Right(List(a, a, a))
Note: in examples above we used many
and manyTill
combinators. Although this approach looks appealing, it causes creation of potentially big number of monadic bindings at runtime. This may lead to a considerable performance degradation. Use these combinators carefully and consider using the string-specific alternatives (satisfyMany
, anyCharTill
, oneOfMany
, etc.) instead. The same is true for combinators many1
, skipMany
and skipMany1
.
scala> parseText(char('a') <+> char('b'), "baba")
res7: Either[parsecat.ParseError[parsecat.parsers.TextPosition],Char] = Right(b)
Note: in the last example we used the <+>
operator. This is a monoid associative operator or a sum (contrary to a product provided by the applicative functor). It first applies a parser to its left and if the parsing is unsuccessful, the parser on the right side of the expression will be applied instead. The same can be expressed with a help of the orElse
combinator:
scala> parseText(orElse(char('a'), char('b')), "baba")
res9: Either[parsecat.ParseError[parsecat.parsers.TextPosition],Char] = Right(b)
Numeric parsers
Numeric parsers - are extension to character parsers, which introduce parsing support for numeric literals. The following import is required:
scala> import parsecat.parsers.numeric._
import parsecat.parsers.numeric._
Examples:
scala> parseText(integer, "1234567")
res10: Either[parsecat.ParseError[parsecat.parsers.TextPosition],Int] = Right(1234567)
scala> parseText(double, "1.2345E67")
res11: Either[parsecat.ParseError[parsecat.parsers.TextPosition],Double] = Right(1.2345E67)
All common numeric types are supported: byte
, short
, int
, long
, float
, double
, bigInt
and bigDecimal
.
JSON parser
The JSON parser was added as a reference implementation and a good example of expressive power of parser combinators. The entire implementation is less than 60 LOC and could've been a part of this README, but instead you may find it here: https://github.com/izeigerman/parsecat/blob/master/json/src/main/scala/parsecat/parsers/json/JsonParsers.scala.
scala> :paste
// Entering paste mode (ctrl-D to finish)
import parsecat.parsers.json._
val jsonStr =
"""{
| "field1": "test",
| "field2": [
| 1, 2, 3
| ],
| "field3": {
| "field4": true,
| "field5": null,
| "field6": [
| { "field7": 1.234 }, { "field8": false }
| ]
| }
|}""".stripMargin
parseJson(jsonStr)
// Exiting paste mode, now interpreting.
res12: Either[parsecat.ParseError[parsecat.parsers.TextPosition],parsecat.parsers.json.JsValue] = Right(JsObject(Map(field1 -> JsString(test), field2 -> JsArray(List(JsInt(1), JsInt(2), JsInt(3))), field3 -> JsObject(Map(field4 -> JsBoolean(true), field5 -> JsNull, field6 -> JsArray(List(JsObject(Map(field7 -> JsDouble(1.234))), JsObject(Map(field8 -> JsBoolean(false))))))))))
NOTE: this parser was created as an example and a reference implementation and should never be used in a real project. Although the general parsing performance has been significantly improved in the version 0.2.0
, it still can't compete with any modern hand-written JSON parser out there. Its performance is the same as scala-parser-combinators
version of JSON parser, which is deprecated by now.