cats-parse icon indicating copy to clipboard operation
cats-parse copied to clipboard

Expression parser

Open christiankjaer opened this issue 2 years ago • 4 comments

Essentially a port of https://hackage.haskell.org/package/parser-combinators-1.3.0/docs/src/Control.Monad.Combinators.Expr.html

  • Added a small microbenchmark
  • Added property based test with arithmetic expressions and friends

christiankjaer avatar Sep 25 '22 12:09 christiankjaer

Thanks for the PR. I think we could add something like this.

I haven't had a chance to review in detail, but I had a couple of comments:

  1. I'd like to see scalacheck test coverage where we generate a random expression, serialize it to string with random whitespace, and then verify that it parses exactly the same.
  2. I'd like some benchmarks to make sure the performance is close to a hand written parser.
  3. I noticed there are a few uses of flatMap. Very few parsers really require that and it definitely hurts performance. Could we try to avoid using flatMap and just use product (or ~, *>, <*) instead?
  4. It seems this is baking in the idea that whitespace is never significant. I wonder if we can make a whitespace Parser0[Unit] as an argument which could potentially be Parser.unit for a case where you don't want to allow any extra whitespace.

johnynek avatar Sep 26 '22 17:09 johnynek

Great. I just published the draft here to check if something like this was wanted. I don't know how many use cats parse for expression parsers.

I will give each of the points a go when I have time. I am not sure about (4), but I will have a think about it.

christiankjaer avatar Sep 26 '22 18:09 christiankjaer

I have not addressed (4). I think that route is still possible with this approach. You just don't munch whitespace in the base parser but in the operators instead, or some combination that suits the use case.

Handling of whitespace is not done at all in the expression parser.

christiankjaer avatar Sep 27 '22 18:09 christiankjaer

Feel like it might be worth highlighting a similar but alternative approach: https://www.cs.tufts.edu/comp/150FP/archive/jamie-willis/parsing-patterns.pdf with a Scala version available in https://dl.acm.org/doi/10.1145/3550198.3550427 (I can get a pdf on request). It is slightly different to the parser-combinators version in that it allows for heterogeneity in the levels of the precedence table. The same mechanism is used by https://github.com/j-mie6/parsley within the parsley.expr package. As it happens, the chain combinators themselves are useful in a library even if precedence is available: I'm not sure if cats-parse already uses them or not.

The precedence scheme encoded by that paper does not have the ternary operators, but does avoid avoid flatMap at all costs: providing efficient implementations of them.

I concur that whitespace parsing is not something that should be handled by the precedence machinery: this is explicitly a convention that should be kept in the "lexer" (see section 3 of those papers). Anecdotally, we use these design patterns at Imperial, for our compilers project and it works wonders compared to when we didn't have them!

j-mie6 avatar Mar 06 '23 08:03 j-mie6