guidance icon indicating copy to clipboard operation
guidance copied to clipboard

[WIP] Fix infinite loop when computing parse tree for recursive nullable grammars

Open hudson-ai opened this issue 1 year ago • 2 comments

Computing the parse tree (needed for getting captures) previously caused an infinite loop if the grammar has loops of nullable rules, where a nullable rule is any rule where every symbol on the RHS is nullable.

E.g., the grammar

  A -> A C
  A -> B
  A ->
  B -> A
  C -> 'x'

has nullable rules

  A -> B
  A ->
  B -> A

and there is a loop in these rules formed by A -> B and B -> A.

For a deeper discussion on this, see section "One Last Trap" of https://loup-vaillant.fr/tutorials/earley-parsing/parser

This PR solves the issue by keeping a set of visited nodes when computing the parse tree (or getting captures) and skipping any node that has already been seen. AFAIK, the only way this can happen is when we have these nullable loops, but admittedly I still have a lot to learn about the parser (so more eyes on this would be really helpful!)

This closes #863

hudson-ai avatar Jun 02 '24 04:06 hudson-ai

:warning: Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 60.33%. Comparing base (15ff21d) to head (867adbe). Report is 36 commits behind head on main.

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #874      +/-   ##
==========================================
+ Coverage   57.02%   60.33%   +3.30%     
==========================================
  Files          63       63              
  Lines        4575     4583       +8     
==========================================
+ Hits         2609     2765     +156     
+ Misses       1966     1818     -148     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Jun 02 '24 04:06 codecov-commenter

Currently, my fix is to avoid infinitely recursing over children, but I suspect that it might be nicer to prevent accepting parses that have infinitely looping children to begin with. That being said, the behavior seems correct (in that captures are correctly being... captured).

@slundberg I would really appreciate some input from you on this as well as what the correct semantics of partial captures should be (I tried to preserve existing behavior, but I'm not entirely clear on whether that behavior was the intended one).

hudson-ai avatar Jun 05 '24 16:06 hudson-ai