Arpeggio
Arpeggio copied to clipboard
Reserved Keywords ?
Hi, first of all, thanks for this library !
Sorry if I'm missing something obvious, but I'm wondering how I can simply handle reserved keywords in my grammar with Arpeggio.
Suppose that I have some reserved keywords in my grammar, like class
or function
, and I don't want them to be recognized as valid identifier
:
from arpeggio import Kwd, EOF, ParserPython
from arpeggio import RegExMatch as _
##### GRAMMAR ################################################################
def identifier (): return _(r'[a-zA-Z]\w*') # generate ambiguities with reserved keywords
# ...
def class_body (): return '{', '}'
def class_name (): return identifier
def class_declaration (): return Kwd ('class'), class_name, class_body, EOF
##### MAIN ###################################################################
input_program = 'class class { }'
parser = ParserPython(class_declaration, ignore_case=False, debug=True)
parser.parse(input_program)
The code above will parse the text 'class class { }'
without errors, because the second word class
match the rule class_name
:
?? Try match rule class_name=RegExMatch([a-zA-Z]\w*) in class_declaration at position 6 => class *class { }
++ Match 'class' at 6 => 'class *class* { }'
For now, I'm using the following workaround that excludes keywords from the identifier
regex:
reserved_keywords = ['class', 'function'] # ...
def identifier (): return _(r'(?!\b({})\b)([a-zA-Z]\w*)'.format ('|'.join (reserved_keywords)))
It works as I expected:
arpeggio.NoMatch: Expected class_name at position (1, 7) => 'class *class { }'.
But is there something more automatic in Arpeggio to achieve that same purpose ? I'm thinking of something like the Keyword class in PyPEG that internally maintains a list of keywords used in the grammar.
Thanks !
Hi,
A PEG way is to use a negative lookahead (Not
predicate) to check that there is no keyword ahead before trying to match identifier.
def identifier (): return Not(reserved_keywords), _(r'...')
Another approach is to implement your own parsing expression classes.
In this case you could inherit RegExMatch
and override _parse
method to verify that there is no keyword ahead before calling super
.
First approach is probably what you want. Second is an interesting exercise in making custom parsing expressions. :)
Hi,
Thank you for your quick reply !
I was actually looking for a solution to handle keywords of the grammar more dynamically. The following code will do the job for me:
from arpeggio import StrMatch, Not
from arpeggio import RegExMatch as _
class Kwd (StrMatch):
_keyword_table = []
def __init__ (self, to_match):
super (Kwd, self).__init__ (to_match)
self.to_match = to_match
self.root = True
self.rule_name = 'keyword'
if to_match not in Kwd._keyword_table:
Kwd._keyword_table.append (to_match)
def reserved_keywords (): return _(r'(\b({})\b)'.format ('|'.join (Kwd._keyword_table)))
def identifier (): return Not (reserved_keywords), _(r'([a-zA-Z]\w*)')
# ...