parslet icon indicating copy to clipboard operation
parslet copied to clipboard

repeat.as outputs [] for empty string input

Open hagabaka opened this issue 10 years ago • 6 comments

Normally when you use as on repeat, the matched string will be in the result hash:

str('a').repeat.as(:b).parse('aaaaa')
# => {:b=>"aaaaa"@0}

However when the input string is empty, which repeat accepts by default, the result will have an empty array instead:

str('a').repeat.as(:b).parse('')
# => {:b=>"[]"}

This inconsistency makes it hard to write transform rules, because "simple" only matches if the input string is non-empty, and "sequence" only matches if the input is empty:

transform = Transform.new {rule(:b => simple(:b)) {b}}
transform.apply str('a').repeat.as(:b).parse('aaaaa')
# => "aaaaa"@0
transform.apply str('a').repeat.as(:b).parse('')
# => {:b=>[]}

transform = Transform.new {rule(:b => sequence(:b)) {b.join}}
transform.apply str('a').repeat.as(:b).parse('aaaaa')
# => {:b=>"aaaaa"@0}
transform.apply str('a').repeat.as(:b).parse('')
# => ""

Of course if the subtree is as simple as {:b => '...'} or {:b => []}, another transform rule can normalize them. But if there are multiple keys in the subtree, it would be tedious to write that rule. Is there a reason why the parser shouldn't just output empty string for repeat.as when the input is empty?

hagabaka avatar Feb 01 '15 22:02 hagabaka

Yes there is. It becomes apparent when you do something like this:

str('a').as(:a).repeat.as(:b).parse('aaaaa')

However, I would consider a second (third/last) argument to repeat to specify whether an empty match should result in a nil or in a [] - parslet can't know really without explicit indication. How would you like that?

# Fantasy code ahead: 
str('a').repeat(no_match: nil).as(:b).parse('aaaaa')

kschiess avatar Feb 09 '15 10:02 kschiess

Just want to say that i also have some rules that i would like to clean up, remove the duplication. +1 as it were, and the suggestion sounds good.

rubydesign avatar Feb 09 '15 13:02 rubydesign

str('a') is a Parslet::Atoms::Str, while str('a').as(:a) is a Parslet::Atoms::Named. Could repeat automatically determine its as output for empty input based on this difference?

hagabaka avatar Feb 09 '15 17:02 hagabaka

If you add multiple layers of Entity, Sequence, ... on top, you wont be able to tell.

kschiess avatar Feb 10 '15 08:02 kschiess

I've thought about this and see the opportunity for improvement now. I'll execute your last idea as soon as I get to it.

kschiess avatar Nov 24 '16 13:11 kschiess

Is there any plan to implement this? This issue is old but looks like there has been some recent activity on the repo. I agree with everything @hagabaka said above.

smackesey avatar Jul 01 '20 21:07 smackesey