pest Allow manual rule definition for complex rules

While the .pest grammar format is quite flexible, there are circumstances under which it's incapable of expressing what's required. Writing the rule manually can solve the problem, but it seems like Pest either supports automatic generation for every function, no exceptions, or you must manually define everything.

I'm facing a situation where a single rule out of hundreds is unable to be expressed with the grammar.

Allowing for manual function definition in addition to automatic definition would solve this.

For example, imagine a grammar like:

char = _{ '\u{01}'..'\u{7f}' }
number = { ('0'..'9')+ }
literal = { "{" ~ number ~ "}" char* }

Where the parser needs to handle elements like "{4}testX..." being parsed as ( 4, "test" ) and the X... part is not consumed, but left for the next element. In order for this to work, number needs to be converted and employed to consume a fixed number of char. Easily handled with a manual parser.

I'd like to propose an alternative syntax for situations like this:

char = _{ '\u{01}'..'\u{7f}' }
number = { ('0'..'9')+ }
literal = fn

Where that indicates literal is a manually defined function and is called as Self::literal(...) instead.

I've been working on a fork which implements this where it should be able to handle this with some very minor alterations, but would appreciate some feedback and assistance.

May 27 '22 14:05 tadman

I'm not sure I understand your example. There is no indication of what the rule for splitting the characters that come after the closing bracket should be. The logical inconsistencies like in this sentence

In order for this to work, number needs to be converted and employed to consume a fixed number of char.

where number comes in despite not containing a call to the char rule, on top of the aforementioned lack of information makes it very hard to provide help.

As far as I can tell, what you're trying to do is already easily possible, but I'm not even sure of what you're actually trying to do.

Things that could help:

Fixed repetition: char{4} matches precisely 4 instances of char
Look-ahead: { (!"}" ~ ANY)* } for example matches anything up until a closing bracket without consuming the closing bracket

Given that you opened this issue quite a long time ago, I hope you've since found the answer; I'm mostly answering this in case other people bump into this issue. Have a nice day :)

Oct 27 '22 14:10 HoloTheDrunk

This came about due to a very bizarre feature of the IMAP specification where a "synchronizing literal" is delimited this way. What's needed is for the sequence {4}ABCDEF.. to be parsed as the tokens 4, ABCD with the EF... part not consumed, as it's another sequence. Where it's {2}ABCDEF then it parses as 2, AB with CDEF... left alone.

The {n} part means the following n octets are part of the token, then the remainder reverts to regular parsing.

I might be understanding Pest incorrectly, but I need to extract the number, convert it to an integer, then step through the string n characters exactly. It's nice you can repeat using a similar notation in a grammar, but this length is unknown until user input is processed. It could be anything. I can't match a precise number because that number is run-time generated.

Additionally there's no delimiter that can be used to capture the end, it's just a series of random octets, no context given other than the length identifier.

Hope that explains better. Even if this was (somehow?) accommodated by Pest in the grammar itself, being able to drop down and implement it in very specific detail still seems like it could be useful from time to time. Right now it seems like you can either hand-assemble your entire grammar, or have it all auto-generated, with no opportunity to selectively switch. This fork allows you to define functions that are mapped into your auto-generated grammar, which I think could be helpful.

Oct 29 '22 22:10 tadman

pest pest copied to clipboard

Allow manual rule definition for complex rules

pest
pest copied to clipboard