lark Bugfix Earley: only transform the solutions we yield

Fixes an issue overlooked by PR #1444

Addresses performance issues mentioned in #1443

Aug 15 '24 08:08 erezsh

:warning: Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 89.78%. Comparing base (acfe33d) to head (a884877).

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #1447   +/-   ##
=======================================
  Coverage   89.78%   89.78%           
=======================================
  Files          52       52           
  Lines        7821     7823    +2     
=======================================
+ Hits         7022     7024    +2     
  Misses        799      799

Flag	Coverage Δ
unittests	`89.78% <100.00%> (+<0.01%)`	:arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Aug 15 '24 08:08 codecov-commenter

Yes, that's a good point. It should return several solutions.

I think the correct thing is to break the API, since the current behavior is broken. But that would break compatibility.

Another option is to distinguish between ambiguity="forest" and ="forests" .. but that sounds a bit silly.

Another option is to return a new SymbolNode that has the solutions as its children. That way we don't introduce a new type (or a python list), but then it's not clear to me what its arguments should be (meaning s, start, end), or if it would constitute a valid "forest" object.

Or we can just leave it as-is.. (no one has complained so far.)

@chanicpanic Maybe you have an opinion about this? (re solutions being an array, not always a single SymbolNode)

Aug 15 '24 16:08 erezsh

I agree that an array is correct given the current behavior. Although, I wouldn't make the change just yet because I believe the earley implementation can be adjusted so that we don't end up with more than one distinct root in the first place. Sharing the node_cache between scan and predict_and_complete seems to do the trick, albeit with a few test failures to look into.

Aug 16 '24 05:08 chanicpanic

No longer relevant.

Aug 30 '24 08:08 erezsh

lark lark copied to clipboard

Bugfix Earley: only transform the solutions we yield

Codecov Report

lark
lark copied to clipboard