Arpeggio
Arpeggio copied to clipboard
Support Inheritance on Grammars and Visitors, enhance Visitor Definition
I love Arpeggio, but now when really working with it, I would like to have certain features :
since I do Parsers for many different Formats, there are some Grammars which are repeating over and Over again - so i want to put them into a class, inherit them, monkey patch them etc. - thats working at the moment like this :
class GrammarBase(object):
grammar = arpeggio.ParsingExpression()
whitespace = '\t '
class Visitor(arpeggio.PTNodeVisitor):
pass
class GrammarBasic(GrammarBase):
# PROBLEM1 - it works only with staticmethods
@staticmethod
def grammar_basic_1():
return arpeggio.RegExMatch(r"...")
@staticmethod
def grammar_basic_2():
return arpeggio.RegExMatch(r"...")
# grammar is the default I use - like in pypeg2
@staticmethod
def grammar():
return arpeggio.Sequence(arpeggio.RegExMatch(r"..."), GrammarBasic.grammar_basic_1, GrammarBasic.grammar_basic_2)
class Visitor(GrammarBase.Visitor):
# PROBLEM2 - the prefix "visit_" is really useless here
def visit_grammar_basic_1(self, node, children):
return <something useful>
def visit_grammar_basic_2(self, node, children):
return <something useful>
class GrammarSpecific1(GrammarBasic):
# override the default whitespace with a specific one
whitespace = '\t \n'
@staticmethod
def grammar_specific_1():
return arpeggio.Sequence(GrammarBasic.grammar_basic_1, GrammarBasic.grammar_basic_2)
# grammar is the default I use - like in pypeg2
@staticmethod
def grammar():
return arpeggio.Sequence(arpeggio.RegExMatch(r"..."), GrammarSpecific1.grammar_specific_1, GrammarBasic.grammar_basic_1)
class Visitor(GrammarBasic.Visitor, <and maybe some more>):
def visit_grammar_specific_1(self, node, children):
return <something useful>
# PROBLEM3 - Visitor gets confused, because it does not know
# which one to use - GrammarBasic.grammar or GrammarSpecific1.grammar
# so this is not possible !
# def visit_grammar(self, node, children):
# return <something useful>
def parse_file(path_file: Union[str, pathlib.Path], grammar: GrammarBase):
"""
>>> read_file(path_file='/some_file.txt', grammar=GrammarSpecific1())
"""
with open(str(path_file), 'r') as data_file:
string_data = data_file.read()
parser = arp.ParserPython(grammar.grammar, ws=grammar.whitespace)
parse_tree = parser.parse(string_data)
data = arp.visit_parse_tree(parse_tree, grammar.Visitor())
return data
So my proposals are :
-
do not only store the function name of grammar definitions in the parse nodes, but the Object specification - so we dont get name confusions. At the moment You can not use the same names in
GrammarBasicandGrammarSpecific. -
get rif of the
visit_prefix in the Visitor Class - its useless. You can look out for both for backward compatibility (with deprecation warning) -
Grammars should be able to be defined as standard methods of the class, not as
@staticmethod -
Visitors definition should be better like :
class Visitor(arpeggio.PTNodeVisitor):
# if grammar "grammarobject" like class.method is hit, some_name gots called
def some_name(self, grammarobject, node, children ):
return <something useful>
So the name of the Visitor.method can be chosen as You like, and the correct method of the Visitor should be called, even it is inherited from another Visitor super().....
In short : grammars defined in Classes should not collide, even if they have the same name as a method in their Base Class Visitors definition should be not ambiguouse , by not using the Method Name, but the rule object to define/map it.
I hope I made myself clear, here an example :
class Visitor(arpeggio.PTNodeVisitor):
# if grammar "grammarobject" like class.method is hit, some_name gots called
def some_name(self, grammarobject1, node, children ):
return <something useful>
def some_name2(self, grammarobject2, node, children ):
return <something useful>
class Visitor2(Visitor):
def some_name3(self, grammarobject3, node, children ):
return <something useful>
# on grammarobject4, Visitor2.some_name should be called
def some_name(self, grammarobject4, node, children ):
return <something useful>
# on grammarobject1, Visitor2.some_name5 should be called,
# because it is overwritten with that definition
def some_name5(self, grammarobject1, node, children ):
return <something useful>
what do You developers think - any chances for that ? do You think its a bad/good idea, and why ? Maybe there are easier ways and I use it wrong ? yours sincerely and желим вам пријатан дан Igor ! Robert
Hi Robert,
Thanks for the suggestions. I think that many of what you proposed make sense. Grammar/Visitor inheritance is definitely something that would be useful. Also I agree that visit_ prefix in Visitor method names is something that we could get rid of.
I'm not sure that I'm following your thoughts on point 4. What would be the purpose of grammarobject<x> param in the Visitor calls? Are you proposing to select the right method based on the name of the param?
At first thought, one way to attack on this would be to create a new parser class. Like we have now ParserPython and ParserPEG we could introduce ParserClass/ParserObject which would be a class based grammar definition with inheritance. That way we should stay mostly backward compatible.
I'll leave this issue open as a feature request so will play with it when find some time or if you or someone from the community wants to contribute please comment here.
Dear Igor,
yes, I agree, another ParserClass/ParserObject definitely makes sense.
for point 4 - I tried to archive that Visitors can be subclassed, and that the name of the methods do not collide. The names should have nothing to do with the method called - that means somehow we need to make a mapping between the grammar, and the visitor method to call. My first suggested approach was to put the grammar object as parameter . but I am not sure how to build a hashed list with that, which Visitor Method to be called when a grammar is found in the peg-tree.
maybe something like that :
# the new classed Visitor PTNodeClassVisitor ;-)
class Visitor(arpeggio.PTNodeClassVisitor):
# the mapping between grammars and Visitor Methods - so we can choose
# the visitor method names as we like - and we can change the mapping as we like
def visitor_mapping(self):
map=dict()
map[GrammarClass1.grammarmethod1] = self.some_name
map[GrammarClass1.grammarmethod2] = self.some_name2
return map
def some_name(self, node, children ):
return <something useful>
def some_name2(self,node, children ):
return <something useful>
class Visitor2(Visitor):
def visitor_mapping(self):
map = super().visitor_mapping() # get the mappings of the parent class
map[GrammarClass2.grammarmethod3]=self.some_name3
map[GrammarClass2.grammarmethod4]=self.some_name
map[GrammarClass1.grammarmethod1]=self.some_name5 # overwrite a mapping from the parent class
return map
def some_name3(self, node, children ):
return <something useful>
# even if the method has the same name as in Class "Visitor", it will not collide,
# because in the mapping dict we have:
# Visitor.somename and Visitor1.some_name mapping dict values
def some_name(self, node, children ):
return <something useful>
def some_name5(self, node, children ):
return <something useful>
I think that can be a good solution, what do You think ? хвала на труду, игор искрено Ваш Robert