perl5
perl5 copied to clipboard
Bug report: error with \L, \l, \U and \u operators
From [email protected]
Hello Perl maintainers,
does the operators \L, \l, \U, and \u have right to left associativity, or vice versa? I think, the operators must have right to left associativity as =. Have the operators such property as a priority?
print "\u\LdD\n"; # Dd It seems, first works \L, then \u. Right. print "\u\la\n"; # A First \l, then \u. Right print "\l\ua\n"; # a First \u, then \l. Right print "\L\udD\n"; # Dd It seems, first works \L, then \u. I think, it's a bug! print lc "\udD\n"; # dd Yes, the result differ from previous line! print "\LdD\udD\n"; # dddd It seems, first works \u, then \L, hm...
print "\L\Ua\n"; # Syntax error, oops! print "\U\La\n"; # Syntax error print "\L\La\n"; # Syntax error print "\u\la\n"; # A Right print "\u\ua\n"; # A Right
-- Regards, Serge
From @abigail
On Mon, Feb 21, 2011 at 06:57:58AM -0800, Serge wrote:
# New Ticket Created by Serge # Please include the string: [perl #84578] # in the subject line of all future correspondence about this issue. # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=84578 >
Hello Perl maintainers,
does the operators \L, \l, \U, and \u have right to left associativity, or vice versa? I think, the operators must have right to left associativity as =. Have the operators such property as a priority?
All that's being said about priorities in "Gory details of parsing quoted constructs" in perlop is:
All operations above are performed simultaneously, left to right.
Which is vague enough that any behaviour described below can be explained away as "not a bug". ;-)
print "\u\LdD\n"; # Dd It seems, first works \L, then \u. Right. print "\u\la\n"; # A First \l, then \u. Right print "\l\ua\n"; # a First \u, then \l. Right print "\L\udD\n"; # Dd It seems, first works \L, then \u. I think, it's a bug!
I tend to agree. I'd expect it to be equivalent to
lc ucfirst "dD";
But there's the "left to right" statement. Whatever that means.
print lc "\udD\n"; # dd Yes, the result differ from previous line! print "\LdD\udD\n"; # dddd It seems, first works \u, then \L, hm...
That appears to be inconsistent with "\L\udD\n";
print "\L\Ua\n"; # Syntax error, oops! print "\U\La\n"; # Syntax error print "\L\La\n"; # Syntax error
That's just plain weird, IMO.
print "\u\la\n"; # A Right print "\u\ua\n"; # A Right
Abigail
The RT System itself - Status changed from 'new' to 'open'
From @dcollinsn
On Tue Feb 22 02:31:59 2011, abigail@abigail.be wrote:
print "\L\udD\n"; # Dd It seems, first works \L, then \u. I think, it's a bug!
I tend to agree. I'd expect it to be equivalent to
lc ucfirst "dD";
But there's the "left to right" statement. Whatever that means.
print lc "\udD\n"; # dd Yes, the result differ from previous line!
That appears to be inconsistent with "\L\udD\n";
print "\L\Ua\n"; # Syntax error, oops! print "\U\La\n"; # Syntax error print "\L\La\n"; # Syntax error
That's just plain weird, IMO.
Abigail
This is profoundly strange, and is still in blead as described above. Precedence issues aside, I think that "\L\udD" should eq "dd", and "\L\UdD" should also eq "dd" (and in any event should be valid syntax). I thought I understood how these parsed after digging into the other precedence ticket out there, but evidently I do not.
-- Respectfully, Dan Collins
From @khwilliamson
On 08/15/2016 10:40 AM, Dan Collins via RT wrote:
On Tue Feb 22 02:31:59 2011, abigail@abigail.be wrote:
print "\L\udD\n"; # Dd It seems, first works \L, then \u. I think, it's a bug!
I tend to agree. I'd expect it to be equivalent to
lc ucfirst "dD";
But there's the "left to right" statement. Whatever that means.
print lc "\udD\n"; # dd Yes, the result differ from previous line!
That appears to be inconsistent with "\L\udD\n";
print "\L\Ua\n"; # Syntax error, oops! print "\U\La\n"; # Syntax error print "\L\La\n"; # Syntax error
That's just plain weird, IMO.
Abigail
This is profoundly strange, and is still in blead as described above. Precedence issues aside, I think that "\L\udD" should eq "dd", and "\L\UdD" should also eq "dd" (and in any event should be valid syntax). I thought I understood how these parsed after digging into the other precedence ticket out there, but evidently I do not.
The whole thing is broken. I'm not sure I agree with your assessment.
IIRC we decided that someone would look thoroughly at the situation and come back with a proposal. I thought demerphq was doing it, and he thought I was doing it, and we both hoped someone else would do it. And there it remains.
In thinking about it lately, I:
a) wonder if we should create a single ticket for this, including \Q, and merge all the other tickets into it.
b) note that the regex pattern results diverge from the double-quoted string results, and the latter is more sane; so that the regex code should be made to work more like the double-quoted code.
$ blead -le 'print qr/\L\ABCD/' (?^:\abcd)
silently turns what probably was meant to be the assertion \A into a BELL character.
$ blead -le 'print "\L\ABCD"' Unrecognized escape \A passed through at -e line 1. abcd
acts like what I consider sanely, as does this:
$ blead -le 'print "\l\ABCD"' Unrecognized escape \A passed through at -e line 1. aBCD
but I don't know about this:
blead -le 'print qr/\l\ABCD/' (?^:\ABCD)
From @cpansprout
On Mon Aug 15 12:00:59 2016, public@khwilliamson.com wrote:
but I don't know about this:
blead -le 'print qr/\l\ABCD/' (?^:\ABCD)
lcfirst '\A' is equivalent to lc('\\') . 'A'. No surprises there.
Most of the code that handle this is in the tokenizer. I know that code fairly well, so I could fix it easily. I just need to know *how* things *should* behave.
--
Father Chrysostomos
From @cpansprout
On Mon Aug 15 14:24:23 2016, sprout wrote:
On Mon Aug 15 12:00:59 2016, public@khwilliamson.com wrote:
but I don't know about this:
blead -le 'print qr/\l\ABCD/' (?^:\ABCD)
lcfirst '\A' is equivalent to lc('\\') . 'A'. No surprises there.
Oh, I see what you are getting at. qq behaves differently, because things happen in a different order:
$ perl -lwe 'print "\l\ABCD"' Unrecognized escape \A passed through at -e line 1. aBCD
--
Father Chrysostomos
From @cpansprout
On Mon Aug 15 12:00:59 2016, public@khwilliamson.com wrote:
In thinking about it lately, I:
a) wonder if we should create a single ticket for this, including \Q, and merge all the other tickets into it.
I think we actually have two separate issues here. This ticket is about \L\l\U\u etc. not ‘nesting’ consistently (sometimes nesting; sometimes not; sometimes implicitly transposed).
b) note that the regex pattern results diverge from the double-quoted string results, and the latter is more sane; so that the regex code should be made to work more like the double-quoted code.
$ blead -le 'print qr/\L\ABCD/' (?^:\abcd)
silently turns what probably was meant to be the assertion \A into a BELL character.
And this is a *separate* issue; namely, that regular expressions do not apply character escapes and case modifiers in the same order.
They do not have to be fixed at the same time.
--
Father Chrysostomos
From @khwilliamson
On 08/15/2016 03:54 PM, Father Chrysostomos via RT wrote:
On Mon Aug 15 12:00:59 2016, public@khwilliamson.com wrote:
In thinking about it lately, I:
a) wonder if we should create a single ticket for this, including \Q, and merge all the other tickets into it.
I think we actually have two separate issues here. This ticket is about \L\l\U\u etc. not ‘nesting’ consistently (sometimes nesting; sometimes not; sometimes implicitly transposed).
b) note that the regex pattern results diverge from the double-quoted string results, and the latter is more sane; so that the regex code should be made to work more like the double-quoted code.
$ blead -le 'print qr/\L\ABCD/' (?^:\abcd)
silently turns what probably was meant to be the assertion \A into a BELL character.
And this is a *separate* issue; namely, that regular expressions do not apply character escapes and case modifiers in the same order.
They do not have to be fixed at the same time.
Perhaps not, but any decision will need to consider the effects on the totality of the language
As I recently commented on the mailing list:
To put my 2c in for this part, it is necessary and useful that certain ones nest:
perl -E'say "\u\LfoO"' Foo
perl -E'say "\l\UFoO"' fOO
So unless there's a compelling reason otherwise it seems intuitive for them all to work consistently with that.
Just wanted to note that double quoted strings and regex behave differently with regards to escape characters necessarily, and that this necessarily interacts with \U \L and friends differently. The basic issue is that in the regex engine escapes do not mean their literal equivalent, and in a double quoted string they do. Arguably in regex quoting \Q \L \U and friends should be deferred to the regex engine, and act as modifiers to the regex parser and not be converted by the toker at all. We should focus on getting the rules right for double quoted strings, and then have the regex engine simulate that as much as is sensible.
Consider that /a\x{7c}b/
matches very differently to /a|b/
, but "a\x{7c}b"
and "a|b"
are the same strings.