Weird behavior with `unary -` operator
Hi there,
I'm having some weird behavior when trying to use the unary - operator on integers with calls.
Example:
print -1.to_s
I got:
Error: method `unary -` does not exists in `String`.
print -1.to_s
^
I really was expecting a kind of priority on the unary operator before the call.
More weird:
print (-1).to_s
I got:
test.nit:2,1--10: Error: expected an expression.
print (-1).to_s
^
Old thing. it is because the precedence of the unary operator is lower than .
This is quite standard and expected for thinks like -a.x.
However, I agree that the - before integers should not be parsed as an operator with a low precedence but as a part of literal (negative) integers. I do not remember why it was not done this way before (maybe procrastination + no one complained up to now)
Even with -a.x my first guess will be that it's equal to (-a).x more than -(a.x).
But you are right, this seems to be the same behavior than Java.
reopened. -1.foo is still an issue.
The following strikes me as counter-intuitive:
print -1.plus_one # 0
var x = 1
print -x.plus_one # -2
I would, instead, give - higher precedence than ..
In other words, the precedence of - would be higher than that of right-unary operators, which in turn would be higher than that of remaining left-unary operators.
The totally acceptable alternative is to leave things as they currently are:
var x = 3
print -x.exponent(2) # -9
print -3.exponent(2) # -9
Yet, if one was to accept a distinction between -128s8 ( = -128 as a signed 8-bit two's complement number) and - 128s8 (compiler error as 128 does not fit within a signed 8-bit two's complement number), then it would be OK to keep a lower precedence for the - left-operator than for the . right operator, but to allow for including a sign within an integer literal.
Some infos on what some other languages are doing.
C, C++, Java:
-x.fooparsed as-(x.foo)1.foostatically refused:invalid suffix "foo" on floating constant(1).foostatically refused:request for member ‘foo’ on int
Python, JavaScript
-x.fooparsed as-(x.foo)1.foostatically refused:invalid syntaxandUnexpected token ILLEGAL(both precise and useful error messages)(1).foodynamically refused:'int' object has no attribute 'foo'
Ruby:
-x.fooparsed as-(x.foo)1.fooaccepted and works (if foo is defined)-1.fooaccepted as(-1).foowhere-1is a literal- 1.fooaccepted as-(1.foo)where-is the neg operator
Nit (current):
-x.fooparsed as-(x.foo)1.fooaccepted and works (if foo is defined)-1.fooaccepted as-(1.foo)where-is the neg operator
This might be me, but I do really think that the actual spec is unPOLA
Maybe for seasoned developers such as you @privat or @egagnon this seems normal, but as for me, I was really surprised by the spec.
When I write -1.foo, I do mean (-1).foo, not -(1.foo), visually speaking, the - is attached to the 1.
The actual spec managed to surprise me, and it managed to surprise @Morriar, in fact I do think that it might be benefic to start parsing -1.foo as (-1).foo, it definitely makes more sense in my opinion.
And, since I will get this argument, yes, all the other languages seem to do differently, but that does not mean we ought to do the same thing if we think it's a mistake, right ?
So, the literal thing is used in Ruby, the only other language with integer objects. I'm OK with it.
I'm resurrecting the thread, I'd vouch for a generalized approach ala Ruby.
i.e.
var x = -1.foo # Parsed as (-1).foo
var y = - 1.foo # Parsed as -(1.foo)
var z = -x.foo # Parsed as (-x).foo
var a = - x.foo # Parsed as -(x.foo)
This would remove the ambiguousness of Ruby's behaviour by uniformizing the parsing of literals and variables, and imho, this is a clear way of expressing all this.
@R4PaSs I must disagree. Look at the color of -1 in your -1.foo example; it is all blue. It is a single token. So, there's no - operator involved.
I don't think that -x should be treated as a single token. So, if there are two tokens, - and x, then there must not be a difference between -x and - x.
There are other cases in existing programming languages where the placement of a space character makes a difference on lexer behavior:
x = y++ +z;
x = y+ ++z;
The fun one is:
x=y+++z;
@egagnon In job interviews, some enterprises even ask interns what this assignment does. :smile:
@egagnon : If the colour is the problem, I can fix that :) Though actually, with the actual behaviour, it should not be blue since the actual semantic applies the unary minus on the result of foo, my bad !
I admit your answer has got me thinking, I do not know yet what the best behaviour should be, but I certainly am surprised right now by the spec (my lack of knowledge to blame here, since most languages do it like this).
And I know I'd be surprised by an inconsistency in parsing between literals and regular identifiers, if we choose to parse -1.foo as (-1).foo, I think we ought to do the same with identifiers.
I admit I like the Ruby way of parsing literals:
-1.fooaccepted as(-1).foo- 1.fooaccepted as-(1.foo)
And, for consistency, I think we should do the same on identifiers, this would be the least surprising way.
That, or we call this issue irrelevant, and the actual system, as surprising as it might have been by my point of view, has the merit of being consistent.
@R4PaSs One should not confuse scanning and parsing. Traditional languages go through two distinct phases: scanning (lexer) and parsing (parser). This can, sometimes, have a surprising visual result to someone who does not understand this split process. But, most programmers eventually learn about this split and it becomes obvious to them that the lexer wins (as in y+++z). It is the same with preprocessors, where the preprocessor wins over the lexer and parser.
For Nit, the question is actually about what kind of tokens do we want:
- Is
-1one or two tokens? The Ruby approach says: one token. (In languages like C, C++, and Java, it makes no visible difference). - Is
-xone or two tokens? All of the languages I know say: two tokens.
I am in favor of having -1 as a single token, but I am against introducing a new unintuitive signed identifier token type (where intuition is derived from programming in other languages; yes, it is a cultural thing).
@R4PaSs Do you really want a parser error in the following code?
x=y-z
There would be a parsing error on -z: expecting an operator but got a signed variable instead.
The signed integer choice would also create a problem:
x=y-1
Parsing error on -1: expecting an operator but got a signed integer instead.
Actually, this is a very good argument in favor of not introducing signed integer literals...
@egagnon I agree, right now in Nit, -1 is two tokens too, and I do not think it should change.
Maybe to have a POLA behaviour (as I intend it at least), maybe the - should have a greater precedence over the .
This might be a surprise for people coming from other languages though.
The lexer needs not change, the parser needs to.
This would solve the issue of -1.foo parsed as (-1).foo and should we want the current behaviour, we could express it as -(1.foo).
The actual spec is also acceptable in this regard, it's all a manner of personal preference to this point
@privat I am intrigued. What does Ruby do with x=y-1?
Maybe they use a mechanism of the lexer (like the states in SableCC) to indicate that an literal can not immediately follow an identifier.
http://programmingisterrible.com/post/42432568185/how-to-parse-ruby
OK, Ruby's approach is ugly. If we really want to a special treatment for signed integer literals in Nit, I think that it should be done at the syntax level. Something like:
// ambiguous syntax
term =
{integer} integer_constant | ...
integer_constant =
sign? integer_literal;
sign =
{plus} plus |
{minus} minus;
Of course, one has to rewrite this grammar part to eliminate the ambiguity between unary-op one and sign one, but it should be feasible. As a result, we would get:
-1.foo # equivalent to (-1).foo
- 1.foo # equivalent to (- 1).foo
-x.foo # equivalent to -(x.foo)
- x.foo # equivalent to - (x.foo)
x = y - 1 # equivalent to x = y - (1)
In other words, ignored tokens would remain ignored (POLA).
Of course, this doesn't change anything to the lexer/parser separation:
- -1 # unary_op sign int
--1 # autodecrement int => error
I tried to implement some variations but I completely broke the grammar. I'm tempted to wait for sablecc4 before trying more.