pest
pest copied to clipboard
Allow manual rule definition for complex rules
While the .pest
grammar format is quite flexible, there are circumstances under which it's incapable of expressing what's required. Writing the rule manually can solve the problem, but it seems like Pest either supports automatic generation for every function, no exceptions, or you must manually define everything.
I'm facing a situation where a single rule out of hundreds is unable to be expressed with the grammar.
Allowing for manual function definition in addition to automatic definition would solve this.
For example, imagine a grammar like:
char = _{ '\u{01}'..'\u{7f}' }
number = { ('0'..'9')+ }
literal = { "{" ~ number ~ "}" char* }
Where the parser needs to handle elements like "{4}testX..."
being parsed as ( 4, "test" )
and the X...
part is not consumed, but left for the next element. In order for this to work, number
needs to be converted and employed to consume a fixed number of char
. Easily handled with a manual parser.
I'd like to propose an alternative syntax for situations like this:
char = _{ '\u{01}'..'\u{7f}' }
number = { ('0'..'9')+ }
literal = fn
Where that indicates literal
is a manually defined function and is called as Self::literal(...)
instead.
I've been working on a fork which implements this where it should be able to handle this with some very minor alterations, but would appreciate some feedback and assistance.
I'm not sure I understand your example. There is no indication of what the rule for splitting the characters that come after the closing bracket should be. The logical inconsistencies like in this sentence
In order for this to work,
number
needs to be converted and employed to consume a fixed number ofchar
.
where number
comes in despite not containing a call to the char
rule, on top of the aforementioned lack of information makes it very hard to provide help.
As far as I can tell, what you're trying to do is already easily possible, but I'm not even sure of what you're actually trying to do.
Things that could help:
-
Fixed repetition:
char{4}
matches precisely 4 instances ofchar
-
Look-ahead:
{ (!"}" ~ ANY)* }
for example matches anything up until a closing bracket without consuming the closing bracket
Given that you opened this issue quite a long time ago, I hope you've since found the answer; I'm mostly answering this in case other people bump into this issue. Have a nice day :)
This came about due to a very bizarre feature of the IMAP specification where a "synchronizing literal" is delimited this way. What's needed is for the sequence {4}ABCDEF..
to be parsed as the tokens 4
, ABCD
with the EF...
part not consumed, as it's another sequence. Where it's {2}ABCDEF
then it parses as 2
, AB
with CDEF...
left alone.
The {n}
part means the following n octets are part of the token, then the remainder reverts to regular parsing.
I might be understanding Pest incorrectly, but I need to extract the number, convert it to an integer, then step through the string n characters exactly. It's nice you can repeat using a similar notation in a grammar, but this length is unknown until user input is processed. It could be anything. I can't match a precise number because that number is run-time generated.
Additionally there's no delimiter that can be used to capture the end, it's just a series of random octets, no context given other than the length identifier.
Hope that explains better. Even if this was (somehow?) accommodated by Pest in the grammar itself, being able to drop down and implement it in very specific detail still seems like it could be useful from time to time. Right now it seems like you can either hand-assemble your entire grammar, or have it all auto-generated, with no opportunity to selectively switch. This fork allows you to define functions that are mapped into your auto-generated grammar, which I think could be helpful.