Arpeggio
Arpeggio copied to clipboard
Support for injecting extra rules written in Python into PEG grammars
It would be useful to be able to inject rules written as python functions into PEG grammars. This would accomplish two things:
- Greater portability for libraries. I could publish a library with python functions which anyone could use regardless of whether they're using the
peg
,cleanpeg
orpython
parsers. Python functions, although more cumbersome to write, are more composable. - It would allow allow the user to write special rules able to respect whitespace in PEG files, while skipping whitespace in the rest of the rules. I believe this is currently impossible without rewriting the whole grammar in Python.
I'd suggest the following API:
from lib.external import rule1, rule2
from arpeggio.cleanpeg import ParserPEG
parser = ParserPEG(calc_grammar,
"calc",
extra_rules={'rule_name1': rule1, 'rule_name2': rule2})
The user could then use 'rule_name1'
and 'rule_name2'
in the file, and the rules would be automatically resolve. There might be a problem with name clashes between user defined rules and inner rules defined by the external functions, though. I'm not familiar enough with Arpeggio's internals to be sure.
I'm planing a more general approach for parser composability.
Something like this:
from lib.external import rule1, rule2
from arpeggio import GrammarPython, GrammarPEG, GrammarCPEG, Parser
...
parser = Parser(GrammarPython(calc), GrammarPEG(calc_override_in_peg),
GrammarPython(rule1, rule2), GrammarCPEG(clean_peg_addition))
Grammar* callables will know how to read grammar written in different styles and
transform it to internal grammar representation which is known to Parser
class.
Parser
will do grammar composition and full resolving using some predetermined
override rule (e.g. rules that come later in the grammar list will override
former rules with the same name).
In this approach you could mix and match grammars using different styles. E.g., you could do the override in PEG or in clean PEG or in some other form. You could write your own Grammar* wrapper and specify grammar how you see fit and still be able to compose with other grammars.
Grammars could be incomplete, i.e. rules could reference unexisting rules thus providing a kind of extension points. Of course, when forming a final parser all the rules must be available.
Additionally, in the list of the grammars you shall be able to use
ParsingExpressions
directly thus enabling work in a parser combinator style.
All this stuff require some non-trivial changes to the core though.
It sounds excelent! Although I like the possibility of refering to rules of a different grammar, I think there shoud be a "blackbox" option that allows you to hide the inner rules of a grammar. This way you could use rules written by different authors without worrying about name collisions.
Anyway, thanks for writing Arpeggio and making it available for free. It's a great library and the documentation is among the best I've ever read.
On Friday, 7 October 2016, Igor Dejanović [email protected] wrote:
I'm planing a more general approach for parser composability.
Something like this:
from lib.external import rule1, rule2from arpeggio import GrammarPython, GrammarPEG, GrammarCPEG ... parser = Parser(GrammarPython(calc), GrammarPEG(calc_override_in_peg), GrammarPython(rule1, rule2), GrammarCPEG(clean_peg_addition))
Grammar* callables will know how to read grammar written in different styles and transform it to internal grammar representation which is known to Parser class. Parser will do grammar composition and full resolving using some predetermined override rule (e.g. rules that come later in the grammar list will override former rules with the same name).
In this approach you could mix and match grammars using different styles. E.g., you could do the override in PEG or in clean PEG or in some other form. You could write your own Grammar* wrapper and specify grammar how you see fit and still be able to compose with other grammars.
Grammars could be incomplete, i.e. rules could reference unexisting rules thus providing a kind of extension points. Of course, when forming a final parser all the rules must be available.
Additionally, in the list of the grammars you shall be able to use ParsingExpressions directly thus enabling work in a parser combinator style.
All this stuff require some non-trivial changes to the core though.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/igordejanovic/Arpeggio/issues/29#issuecomment-252378801, or mute the thread https://github.com/notifications/unsubscribe-auth/APSENYWbMe5f4Y9nK3eJp_DQor0IIOdYks5qxsv0gaJpZM4KOxM7 .
Just FYI, this is how I did it — https://github.com/vuvova/gdb-tools/blob/64a9280/duel/parser.py
The main grammar starts from line 52, note the token cast
in the line 72. And see above how it's created as a separate Arpeggio parser which later tries line 26, and it that succeeds the token matches, otherwise it doesn't match.
It'd be cleaner to inherit from Match
, not to monkey-patch it, but Arpeggio doesn't allow it at the moment.
Thanks. It would indeed be better if new Match
inherited class is used. What do you get if you try to inherit? I haven't tried something myself but it should generally work, or at least it should be easily fixable if it doesn't work at the moment. I looked into implementation of parser construction and general Match
inherited class instances should be handled at this line.
May be I used an older version? There was isinstance(..., Match)
, as far as I remember.
You can try to inherit with a dummy class, like
class MatchChild(Match)
pass
and see where it won't work. It should be easily fixable, I agree.