PikaParser.jl
PikaParser.jl copied to clipboard
[Feature request] Accept Regex expressions in `Scan`
Proposal
Often I find myself that some clauses are more easily parsed with a regex than with PikaParser clauses. The solution is to user a Scan in a way similar to:
rules = Dict(
...,
:id => PikaParser.scan() do x
matched = match(r"^[a-z][a-zA-Z0-9_]*", x)
isnothing(matched) && return 0
length(matched.match)
end,
...,
)
It would be great if we could just pass the regex to scan.
Unsolved issues
Only regex of the form r"^..." should be accepted. If the ^ clause is not present, then the regex will search the pattern along all the input.
notes for self when I get to this:
- we MIGHT want to abuse the
donotation for writing folds directly into the grammar, as with bison - the usual form of regex semantics is slightly inconvenient for exact matching, and we unfortunately don't have much freedom in regex implementations to support syntax similar to eg.
flex. I'd suggest that for simplifying matters we do 2 helper functions, one if theseregex_to(scans everything to the match, including the match) and secondregex_before(scans everything to the match without the match), with an optional argument to select which match group is actually being selected. (Can be done by taking.offsetand.ncodeunitsfromm.matchorm.captures[N].) - The "to" and "before" variants also allow folks to implement various useful stuff like not_followed_by directly into lexing