lark icon indicating copy to clipboard operation
lark copied to clipboard

Add scanning

Open MegaIng opened this issue 8 months ago • 15 comments

An implement of Lark.scan. Also adds start_pos and end_pos to Lark.parse, Lark.parse_interactive and Lark.lex.

TODO:

  • [x] add example
  • [x] A bit more documentation for what exactly this function does
  • [x] Notes about start_pos and end_pos mirroring the behavior of stdlib re with regard to look behind and look ahead.

But I do think the core logic is pretty stable and I would like a review of that already @erezsh.

Future work:

  • Check if it already works/What needs to be done to make this work with mmap to not have to load the text into memory at all (also involves checking up on the byte parsing implementation)
  • Check to see if I can implement a custom lexer that uses python's stdlib tokenize module, which would have a few benefits especially with regard to the new f string syntax, and how well that would play with this feature.

This PR is based on #1428, so merging it first would be better.

MegaIng avatar Jun 20 '24 01:06 MegaIng