parsimonious
parsimonious copied to clipboard
Incorrect Error Reporting
I'm attempting to use parsimonious to parse the old ANSI file format X12. Currently, I have an example that reports the incorrect position of an error (keep in mind, my grammar was actually wrong but I was looking for a long time in the wrong place):
The grammar: l_isa_loop_isa = ("ISA") elem_sep ("00" / "03") elem_sep (any) elem_sep ("00" / "01") elem_sep (any) elem_sep ("01" / "14" / "20" / "27" / "28" / "29" / "30" / "33" / "ZZ") elem_sep (any) elem_sep ("00501") elem_sep (any) elem_sep ("P" / "T") elem_sep (any) segment_sep
any = ~"[^~]"is elem_sep = "*" segment_sep = "~"
Applied to the string
"ISA00 00 ZZ123456789012345ZZ1234567890123460610151705*>00501000010216*0_T_:~"
Reports that it could not match the "ZZ" on the alternation ("01" / "14" / "20" / "27" / "28" / "29" / "30" / "33" / "ZZ"). However, after stepping through parsimonious with the debugger, I noticed that when the "01" fails to match, the section
if node is None and pos >= error.pos and (self.name or getattr(error.expr, 'name', None) is None):
error.expr = self
error.pos = pos
Sets the error, but fails to set the error to the actual error (matching the literal "00501"). I'm currently using python 3.5. The fix could be as simple as this, but I'm not entirely sure if there are edge cases where this would fail.
class OneOf(Compound):
"""A series of expressions, one of which must match
Expressions are tested in order from first to last. The first to succeed
wins.
"""
def _uncached_match(self, text, pos, cache, error):
cached_pos = error.pos
for m in self.members:
node = m.match_core(text, pos, cache, error)
if node is not None:
# Wrap the succeeding child in a node representing the OneOf:
error.expr = None
error.pos = cached_pos
return Node(self.name, text, pos, node.end, children=[node])```