perl5 icon indicating copy to clipboard operation
perl5 copied to clipboard

Bug report: error with \L, \l, \U and \u operators

Open p5pRT opened this issue 14 years ago • 11 comments

Migrated from rt.perl.org#84578 (status was 'open')

Searchable as RT84578$

p5pRT avatar Feb 21 '11 14:02 p5pRT

From [email protected]

Hello Perl maintainers,

does the operators \L, \l, \U, and \u have right to left associativity, or vice versa? I think, the operators must have right to left associativity as =. Have the operators such property as a priority?

print "\u\LdD\n"; # Dd It seems, first works \L, then \u. Right. print "\u\la\n"; # A First \l, then \u. Right print "\l\ua\n"; # a First \u, then \l. Right print "\L\udD\n"; # Dd It seems, first works \L, then \u. I think, it's a bug! print lc "\udD\n"; # dd Yes, the result differ from previous line! print "\LdD\udD\n"; # dddd It seems, first works \u, then \L, hm...

print "\L\Ua\n"; # Syntax error, oops! print "\U\La\n"; # Syntax error print "\L\La\n"; # Syntax error print "\u\la\n"; # A Right print "\u\ua\n"; # A Right

-- Regards, Serge

p5pRT avatar Feb 21 '11 14:02 p5pRT

From @abigail

On Mon, Feb 21, 2011 at 06​:57​:58AM -0800, Serge wrote​:

# New Ticket Created by Serge # Please include the string​: [perl #84578] # in the subject line of all future correspondence about this issue. # <URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=84578 >

Hello Perl maintainers,

does the operators \L, \l, \U, and \u have right to left associativity, or vice versa? I think, the operators must have right to left associativity as =. Have the operators such property as a priority?

All that's being said about priorities in "Gory details of parsing quoted constructs" in perlop is​:

  All operations above are performed simultaneously, left to right.

Which is vague enough that any behaviour described below can be explained away as "not a bug". ;-)

print "\u\LdD\n"; # Dd It seems, first works \L, then \u. Right. print "\u\la\n"; # A First \l, then \u. Right print "\l\ua\n"; # a First \u, then \l. Right print "\L\udD\n"; # Dd It seems, first works \L, then \u. I think, it's a bug!

I tend to agree. I'd expect it to be equivalent to

  lc ucfirst "dD";

But there's the "left to right" statement. Whatever that means.

print lc "\udD\n"; # dd Yes, the result differ from previous line! print "\LdD\udD\n"; # dddd It seems, first works \u, then \L, hm...

That appears to be inconsistent with "\L\udD\n";

print "\L\Ua\n"; # Syntax error, oops! print "\U\La\n"; # Syntax error print "\L\La\n"; # Syntax error

That's just plain weird, IMO.

print "\u\la\n"; # A Right print "\u\ua\n"; # A Right

Abigail

p5pRT avatar Feb 22 '11 10:02 p5pRT

The RT System itself - Status changed from 'new' to 'open'

p5pRT avatar Feb 22 '11 10:02 p5pRT

From @dcollinsn

On Tue Feb 22 02​:31​:59 2011, abigail@​abigail.be wrote​:

print "\L\udD\n"; # Dd It seems, first works \L, then \u. I think, it's a bug!

I tend to agree. I'd expect it to be equivalent to

lc ucfirst "dD";

But there's the "left to right" statement. Whatever that means.

print lc "\udD\n"; # dd Yes, the result differ from previous line!

That appears to be inconsistent with "\L\udD\n";

print "\L\Ua\n"; # Syntax error, oops! print "\U\La\n"; # Syntax error print "\L\La\n"; # Syntax error

That's just plain weird, IMO.

Abigail

This is profoundly strange, and is still in blead as described above. Precedence issues aside, I think that "\L\udD" should eq "dd", and "\L\UdD" should also eq "dd" (and in any event should be valid syntax). I thought I understood how these parsed after digging into the other precedence ticket out there, but evidently I do not.

-- Respectfully, Dan Collins

p5pRT avatar Aug 15 '16 16:08 p5pRT

From @khwilliamson

On 08/15/2016 10​:40 AM, Dan Collins via RT wrote​:

On Tue Feb 22 02​:31​:59 2011, abigail@​abigail.be wrote​:

print "\L\udD\n"; # Dd It seems, first works \L, then \u. I think, it's a bug!

I tend to agree. I'd expect it to be equivalent to

lc ucfirst "dD";

But there's the "left to right" statement. Whatever that means.

print lc "\udD\n"; # dd Yes, the result differ from previous line!

That appears to be inconsistent with "\L\udD\n";

print "\L\Ua\n"; # Syntax error, oops! print "\U\La\n"; # Syntax error print "\L\La\n"; # Syntax error

That's just plain weird, IMO.

Abigail

This is profoundly strange, and is still in blead as described above. Precedence issues aside, I think that "\L\udD" should eq "dd", and "\L\UdD" should also eq "dd" (and in any event should be valid syntax). I thought I understood how these parsed after digging into the other precedence ticket out there, but evidently I do not.

The whole thing is broken. I'm not sure I agree with your assessment.

IIRC we decided that someone would look thoroughly at the situation and come back with a proposal. I thought demerphq was doing it, and he thought I was doing it, and we both hoped someone else would do it. And there it remains.

In thinking about it lately, I​:

a) wonder if we should create a single ticket for this, including \Q, and merge all the other tickets into it.

b) note that the regex pattern results diverge from the double-quoted string results, and the latter is more sane; so that the regex code should be made to work more like the double-quoted code.

$ blead -le 'print qr/\L\ABCD/' (?^​:\abcd)

silently turns what probably was meant to be the assertion \A into a BELL character.

$ blead -le 'print "\L\ABCD"' Unrecognized escape \A passed through at -e line 1. abcd

acts like what I consider sanely, as does this​:

$ blead -le 'print "\l\ABCD"' Unrecognized escape \A passed through at -e line 1. aBCD

but I don't know about this​:

blead -le 'print qr/\l\ABCD/' (?^​:\ABCD)

p5pRT avatar Aug 15 '16 19:08 p5pRT

From @cpansprout

On Mon Aug 15 12​:00​:59 2016, public@​khwilliamson.com wrote​:

but I don't know about this​:

blead -le 'print qr/\l\ABCD/' (?^​:\ABCD)

lcfirst '\A' is equivalent to lc('\\') . 'A'. No surprises there.

Most of the code that handle this is in the tokenizer. I know that code fairly well, so I could fix it easily. I just need to know *how* things *should* behave.

--

Father Chrysostomos

p5pRT avatar Aug 15 '16 21:08 p5pRT

From @cpansprout

On Mon Aug 15 14​:24​:23 2016, sprout wrote​:

On Mon Aug 15 12​:00​:59 2016, public@​khwilliamson.com wrote​:

but I don't know about this​:

blead -le 'print qr/\l\ABCD/' (?^​:\ABCD)

lcfirst '\A' is equivalent to lc('\\') . 'A'. No surprises there.

Oh, I see what you are getting at. qq behaves differently, because things happen in a different order​:

$ perl -lwe 'print "\l\ABCD"' Unrecognized escape \A passed through at -e line 1. aBCD

--

Father Chrysostomos

p5pRT avatar Aug 15 '16 21:08 p5pRT

From @cpansprout

On Mon Aug 15 12​:00​:59 2016, public@​khwilliamson.com wrote​:

In thinking about it lately, I​:

a) wonder if we should create a single ticket for this, including \Q, and merge all the other tickets into it.

I think we actually have two separate issues here. This ticket is about \L\l\U\u etc. not ‘nesting’ consistently (sometimes nesting; sometimes not; sometimes implicitly transposed).

b) note that the regex pattern results diverge from the double-quoted string results, and the latter is more sane; so that the regex code should be made to work more like the double-quoted code.

$ blead -le 'print qr/\L\ABCD/' (?^​:\abcd)

silently turns what probably was meant to be the assertion \A into a BELL character.

And this is a *separate* issue; namely, that regular expressions do not apply character escapes and case modifiers in the same order.

They do not have to be fixed at the same time.

--

Father Chrysostomos

p5pRT avatar Aug 15 '16 21:08 p5pRT

From @khwilliamson

On 08/15/2016 03​:54 PM, Father Chrysostomos via RT wrote​:

On Mon Aug 15 12​:00​:59 2016, public@​khwilliamson.com wrote​:

In thinking about it lately, I​:

a) wonder if we should create a single ticket for this, including \Q, and merge all the other tickets into it.

I think we actually have two separate issues here. This ticket is about \L\l\U\u etc. not ‘nesting’ consistently (sometimes nesting; sometimes not; sometimes implicitly transposed).

b) note that the regex pattern results diverge from the double-quoted string results, and the latter is more sane; so that the regex code should be made to work more like the double-quoted code.

$ blead -le 'print qr/\L\ABCD/' (?^​:\abcd)

silently turns what probably was meant to be the assertion \A into a BELL character.

And this is a *separate* issue; namely, that regular expressions do not apply character escapes and case modifiers in the same order.

They do not have to be fixed at the same time.

Perhaps not, but any decision will need to consider the effects on the totality of the language

p5pRT avatar Aug 15 '16 22:08 p5pRT

As I recently commented on the mailing list:

To put my 2c in for this part, it is necessary and useful that certain ones nest:

perl -E'say "\u\LfoO"' Foo

perl -E'say "\l\UFoO"' fOO

So unless there's a compelling reason otherwise it seems intuitive for them all to work consistently with that.

Grinnz avatar Aug 04 '22 16:08 Grinnz

Just wanted to note that double quoted strings and regex behave differently with regards to escape characters necessarily, and that this necessarily interacts with \U \L and friends differently. The basic issue is that in the regex engine escapes do not mean their literal equivalent, and in a double quoted string they do. Arguably in regex quoting \Q \L \U and friends should be deferred to the regex engine, and act as modifiers to the regex parser and not be converted by the toker at all. We should focus on getting the rules right for double quoted strings, and then have the regex engine simulate that as much as is sensible.

Consider that /a\x{7c}b/ matches very differently to /a|b/, but "a\x{7c}b" and "a|b" are the same strings.

demerphq avatar Aug 05 '22 13:08 demerphq