parsecat icon indicating copy to clipboard operation
parsecat copied to clipboard

Info

Open Maatary opened this issue 3 years ago • 1 comments

Hi,

I am new to parser combinator library, still in the theory right now, but trying to pick a library to work with. What i am after is a library, that i can apply to any kind of input. In my current toy exercice project, the goal is not to parse text, but an in-memory data structure. In fact, the all exercise is to layer a parser combinator library on top of a library that allow to load and navigate a graph (semantic graph akka ontology). Given that the semantic graph must respect a specific schema, the goal graphs that follow that specific schema.

I looked around and somehow it feels like in scala you are the only library that is not text/string specific. Did i get this wrong. I should be able to use your for any kind of input right ?

Can you provide some guidance as in if i am correctly understanding your library capabilities, and maybe some tips ?

Maatary avatar Sep 14 '21 20:09 Maatary

Hey @Maatary!

Thanks for your interest in this project.

I should be able to use your for any kind of input right ?

That is correct. You'll be able to reuse the existing parser type, its type class instances and a decent set of generic combinators.

However there is still a bit of work you'll have to do to apply it in your custom context.

The first thing I encourage you to check out is a definition of the TextParser:

type TextParser[A] = ParserT.Parser[PagedStream[Char], TextParserContext, TextPosition, A]

The first 3 type parameters represent the type of the input, the context type and the position type accordingly. Below are some details on each:

  1. The input type - is a whatever input you plan to consume, eg. a byte array or a memory region or some custom definition which describes your data structure.
  2. The context type is mostly used for optimizations. For example in context of the text parser it's used to cache the parser error to avoid creation of too many temporary error objects causing this way GC/memory pressure. In the first iteration I suggest omitting this one and providing the Unit type instead. You can do something smarter in subsequent iterations once you get something working.
  3. The position type describes your current parsing position and used for parser progression and error reporting. For example in case of the text parser we want to not just track the absolute position within a character sequence but also the corresponding row and column number to make parsing errors easier to debug. Again in the simplest scenario you might only care about the absolute position, so you can just use Long/ Int type directly.

To summarize above points the simplest parser for your case may look something like this:

type CustomParser[A] = ParserT.Parser[Array[Byte], Unit, Long, A]

which can later be invoked with the following initial arguments:

parser.parse(<input>, (), 0L)

The typical combinator implementation will look something like this:

ParserT[Id, <Input Type>, Unit, Long, <Return Type>]((pos, input, context, info) => {
  // apply parsing logic to "input" at position "pos"
  
  // if the application was successful return ParseOutput(newPos, nextInput, context, result).asRight where
  // newPos - a newly calculated position after the parser has been applied.
  // newInput - if you want to pass a different input instance. The same instance can be passed too.
  // context - just pass the same context instance.
  // result - whatever was extracted as a result of the parser application. 

  // if the application failed return ParseError(pos, message, info).asLeft where
  // pos - is the current position (no position progression takes place here).
  // message - the error message.
  // info - the info string message shared across all parsers, used to enhance error messages with additional context.
})

Don't forget to update the position every time the parser application succeeds. Here is a good reference example of a parser which returns a character that satisfies the given predicate or an error if the predicate returned false.

Hope this helps!

izeigerman avatar Sep 20 '21 16:09 izeigerman