nit icon indicating copy to clipboard operation
nit copied to clipboard

Weird behavior with `unary -` operator

Open Morriar opened this issue 10 years ago • 26 comments

Hi there,

I'm having some weird behavior when trying to use the unary - operator on integers with calls.

Example:

print -1.to_s

I got:

Error: method `unary -` does not exists in `String`.
    print -1.to_s
          ^

I really was expecting a kind of priority on the unary operator before the call.

More weird:

print (-1).to_s

I got:

test.nit:2,1--10: Error: expected an expression.
    print (-1).to_s
    ^

Morriar avatar May 25 '15 16:05 Morriar

Old thing. it is because the precedence of the unary operator is lower than . This is quite standard and expected for thinks like -a.x.

However, I agree that the - before integers should not be parsed as an operator with a low precedence but as a part of literal (negative) integers. I do not remember why it was not done this way before (maybe procrastination + no one complained up to now)

privat avatar May 25 '15 22:05 privat

Even with -a.x my first guess will be that it's equal to (-a).x more than -(a.x).

Morriar avatar May 25 '15 22:05 Morriar

But you are right, this seems to be the same behavior than Java.

Morriar avatar May 25 '15 22:05 Morriar

reopened. -1.foo is still an issue.

privat avatar May 25 '15 22:05 privat

The following strikes me as counter-intuitive:

print -1.plus_one # 0
var x = 1
print -x.plus_one # -2

I would, instead, give - higher precedence than .. In other words, the precedence of - would be higher than that of right-unary operators, which in turn would be higher than that of remaining left-unary operators.

egagnon avatar May 26 '15 11:05 egagnon

The totally acceptable alternative is to leave things as they currently are:

var x = 3
print -x.exponent(2) # -9
print -3.exponent(2) # -9

egagnon avatar May 26 '15 11:05 egagnon

Yet, if one was to accept a distinction between -128s8 ( = -128 as a signed 8-bit two's complement number) and - 128s8 (compiler error as 128 does not fit within a signed 8-bit two's complement number), then it would be OK to keep a lower precedence for the - left-operator than for the . right operator, but to allow for including a sign within an integer literal.

egagnon avatar May 28 '15 13:05 egagnon

Some infos on what some other languages are doing.

C, C++, Java:

  • -x.foo parsed as -(x.foo)
  • 1.foo statically refused: invalid suffix "foo" on floating constant
  • (1).foo statically refused: request for member ‘foo’ on int

Python, JavaScript

  • -x.foo parsed as -(x.foo)
  • 1.foo statically refused: invalid syntax and Unexpected token ILLEGAL (both precise and useful error messages)
  • (1).foo dynamically refused: 'int' object has no attribute 'foo'

Ruby:

  • -x.foo parsed as -(x.foo)
  • 1.foo accepted and works (if foo is defined)
  • -1.foo accepted as (-1).foo where -1 is a literal
  • - 1.foo accepted as -(1.foo) where - is the neg operator

Nit (current):

  • -x.foo parsed as -(x.foo)
  • 1.foo accepted and works (if foo is defined)
  • -1.foo accepted as -(1.foo) where - is the neg operator

privat avatar May 29 '15 00:05 privat

This might be me, but I do really think that the actual spec is unPOLA

Maybe for seasoned developers such as you @privat or @egagnon this seems normal, but as for me, I was really surprised by the spec.

When I write -1.foo, I do mean (-1).foo, not -(1.foo), visually speaking, the - is attached to the 1.

The actual spec managed to surprise me, and it managed to surprise @Morriar, in fact I do think that it might be benefic to start parsing -1.foo as (-1).foo, it definitely makes more sense in my opinion.

And, since I will get this argument, yes, all the other languages seem to do differently, but that does not mean we ought to do the same thing if we think it's a mistake, right ?

lbajolet avatar May 29 '15 01:05 lbajolet

So, the literal thing is used in Ruby, the only other language with integer objects. I'm OK with it.

egagnon avatar Jun 01 '15 02:06 egagnon

I'm resurrecting the thread, I'd vouch for a generalized approach ala Ruby.

i.e.

var x = -1.foo # Parsed as (-1).foo
var y = - 1.foo # Parsed as -(1.foo)
var z = -x.foo # Parsed as (-x).foo
var a = - x.foo # Parsed as -(x.foo)

This would remove the ambiguousness of Ruby's behaviour by uniformizing the parsing of literals and variables, and imho, this is a clear way of expressing all this.

lbajolet avatar Jun 03 '15 22:06 lbajolet

@R4PaSs I must disagree. Look at the color of -1 in your -1.foo example; it is all blue. It is a single token. So, there's no - operator involved.

I don't think that -x should be treated as a single token. So, if there are two tokens, - and x, then there must not be a difference between -x and - x.

There are other cases in existing programming languages where the placement of a space character makes a difference on lexer behavior:

x = y++ +z;
x = y+ ++z;

egagnon avatar Jun 04 '15 00:06 egagnon

The fun one is:

x=y+++z;

egagnon avatar Jun 04 '15 00:06 egagnon

@egagnon In job interviews, some enterprises even ask interns what this assignment does. :smile:

jcbrinfo avatar Jun 04 '15 13:06 jcbrinfo

@egagnon : If the colour is the problem, I can fix that :) Though actually, with the actual behaviour, it should not be blue since the actual semantic applies the unary minus on the result of foo, my bad !

I admit your answer has got me thinking, I do not know yet what the best behaviour should be, but I certainly am surprised right now by the spec (my lack of knowledge to blame here, since most languages do it like this).

And I know I'd be surprised by an inconsistency in parsing between literals and regular identifiers, if we choose to parse -1.foo as (-1).foo, I think we ought to do the same with identifiers.

I admit I like the Ruby way of parsing literals:

  • -1.foo accepted as (-1).foo
  • - 1.foo accepted as -(1.foo)

And, for consistency, I think we should do the same on identifiers, this would be the least surprising way.

That, or we call this issue irrelevant, and the actual system, as surprising as it might have been by my point of view, has the merit of being consistent.

lbajolet avatar Jun 04 '15 14:06 lbajolet

@R4PaSs One should not confuse scanning and parsing. Traditional languages go through two distinct phases: scanning (lexer) and parsing (parser). This can, sometimes, have a surprising visual result to someone who does not understand this split process. But, most programmers eventually learn about this split and it becomes obvious to them that the lexer wins (as in y+++z). It is the same with preprocessors, where the preprocessor wins over the lexer and parser. For Nit, the question is actually about what kind of tokens do we want:

  • Is -1 one or two tokens? The Ruby approach says: one token. (In languages like C, C++, and Java, it makes no visible difference).
  • Is -x one or two tokens? All of the languages I know say: two tokens.

I am in favor of having -1 as a single token, but I am against introducing a new unintuitive signed identifier token type (where intuition is derived from programming in other languages; yes, it is a cultural thing).

egagnon avatar Jun 04 '15 15:06 egagnon

@R4PaSs Do you really want a parser error in the following code?

x=y-z

There would be a parsing error on -z: expecting an operator but got a signed variable instead.

egagnon avatar Jun 04 '15 15:06 egagnon

The signed integer choice would also create a problem:

x=y-1

Parsing error on -1: expecting an operator but got a signed integer instead.

egagnon avatar Jun 04 '15 15:06 egagnon

Actually, this is a very good argument in favor of not introducing signed integer literals...

egagnon avatar Jun 04 '15 15:06 egagnon

@egagnon I agree, right now in Nit, -1 is two tokens too, and I do not think it should change.

Maybe to have a POLA behaviour (as I intend it at least), maybe the - should have a greater precedence over the .

This might be a surprise for people coming from other languages though.

The lexer needs not change, the parser needs to.

This would solve the issue of -1.foo parsed as (-1).foo and should we want the current behaviour, we could express it as -(1.foo).

The actual spec is also acceptable in this regard, it's all a manner of personal preference to this point

lbajolet avatar Jun 04 '15 15:06 lbajolet

@privat I am intrigued. What does Ruby do with x=y-1?

egagnon avatar Jun 04 '15 15:06 egagnon

Maybe they use a mechanism of the lexer (like the states in SableCC) to indicate that an literal can not immediately follow an identifier.

jcbrinfo avatar Jun 04 '15 18:06 jcbrinfo

http://programmingisterrible.com/post/42432568185/how-to-parse-ruby

jcbrinfo avatar Jun 04 '15 18:06 jcbrinfo

OK, Ruby's approach is ugly. If we really want to a special treatment for signed integer literals in Nit, I think that it should be done at the syntax level. Something like:

// ambiguous syntax
term =
  {integer} integer_constant | ...
integer_constant =
  sign? integer_literal;
sign =
  {plus} plus |
  {minus} minus;

Of course, one has to rewrite this grammar part to eliminate the ambiguity between unary-op one and sign one, but it should be feasible. As a result, we would get:

-1.foo # equivalent to (-1).foo
- 1.foo # equivalent to (- 1).foo
-x.foo # equivalent to -(x.foo)
- x.foo # equivalent to - (x.foo)
x = y - 1 # equivalent to x = y - (1)

In other words, ignored tokens would remain ignored (POLA).

egagnon avatar Jun 05 '15 11:06 egagnon

Of course, this doesn't change anything to the lexer/parser separation:

- -1 # unary_op sign int
--1 # autodecrement int => error

egagnon avatar Jun 05 '15 12:06 egagnon

I tried to implement some variations but I completely broke the grammar. I'm tempted to wait for sablecc4 before trying more.

privat avatar Aug 06 '15 20:08 privat