PikaParser.jl icon indicating copy to clipboard operation
PikaParser.jl copied to clipboard

[Feature request] Accept Regex expressions in `Scan`

Open mofeing opened this issue 2 years ago • 1 comments

Proposal

Often I find myself that some clauses are more easily parsed with a regex than with PikaParser clauses. The solution is to user a Scan in a way similar to:

rules = Dict(
    ...,
    :id => PikaParser.scan() do x
        matched = match(r"^[a-z][a-zA-Z0-9_]*", x)
        isnothing(matched) && return 0
        length(matched.match)
    end,
    ...,
)

It would be great if we could just pass the regex to scan.

Unsolved issues

Only regex of the form r"^..." should be accepted. If the ^ clause is not present, then the regex will search the pattern along all the input.

mofeing avatar Nov 29 '23 19:11 mofeing

notes for self when I get to this:

  • we MIGHT want to abuse the do notation for writing folds directly into the grammar, as with bison
  • the usual form of regex semantics is slightly inconvenient for exact matching, and we unfortunately don't have much freedom in regex implementations to support syntax similar to eg. flex. I'd suggest that for simplifying matters we do 2 helper functions, one if these regex_to (scans everything to the match, including the match) and second regex_before (scans everything to the match without the match), with an optional argument to select which match group is actually being selected. (Can be done by taking .offset and .ncodeunits from m.match or m.captures[N].)
  • The "to" and "before" variants also allow folks to implement various useful stuff like not_followed_by directly into lexing

exaexa avatar Dec 01 '23 16:12 exaexa