Hex, octal and binary constants should not be unsigned by default
The current semantics for integer constants is a small departure from C: Binary, octal and hexadecimal will implicitly be unsigned. This is not true in C except if the constant is between 0x80000000 and 0xffffffff and the equivalent 64-bit range.
With the current C3 semantics, -0x1 is positive but would be negative in C. If C3 is meant to adhere to the C semantics for integer constants, this should be fixed.
Test case:
// These pass OK
$assert(-0x80000000 > 0);
$assert(-0x80000000 == 0x80000000);
$assert(-0x8000000000000000 > 0);
$assert(-0x8000000000000000 == 0x8000000000000000);
$assert($sizeof(0x7fffffff) == 4);
$assert($sizeof(0x80000000) == 4);
$assert($sizeof(2147483648) == 8);
// These don't:
$assert(-0x1 < 0);
$assert(-01 < 0);
$assert(-0b1 < 0);
These C semantics are quite confusing, I suggest negating a uint or a ulong should be generate an error or at least a warning.
My proposed change would do the following:
- Bake the
-so that the check is on the actual value. - Negative numbers are always signed.
- Hex/oct/binary will use the number of characters to determine the minimum type. For example, an 16 character hex will be at least long/ulong.
- Negating an explicitly unsigned literal is an error, e.g.
-1U - Hex/oct/binary is unsigned by default, but signed if
-or aisuffix. - Dec is signed by default
Any opinion on this @chqrlie
Ping
I'll close it then.
My proposed change would do the following:
Sorry about the lag, I did not get notified for this proposal
- Bake the
-so that the check is on the actual value.
I am not sure what you mean by Bake the -... If you mean that -1 becomes an integer literal instead of an expression, I don't like it and I think it will create problems in macros and templates.
- Negative numbers are always signed.
There a no negative numbers, there are signed types that have a negative value. Expressions have a type that is either signed or unsigned. Expressions involving literals should behave the same as the same expressions with named constants or variables.
The subtle questions are:
- what is the type of the subtraction of 2 unsigned types?
- what is the type of the negation of an unsigned type?
- Hex/oct/binary will use the number of characters to determine the minimum type. For example, an 16 character hex will be at least long/ulong.
This is questionable: would 0x000000000 be a ulong then?
- Negating an explicitly unsigned literal is an error, e.g.
-1U
I tend to agree on this one. More generally, negating an unsigned expression should at least generate a warning, possibly an error.
- Hex/oct/binary is unsigned by default, but signed if
-or aisuffix.
This rule is too subtle for most programmers and does not fix the problem: would i + 0xFF still become unsigned ?
This is even less intuitive than the C rule (Hex/oct/binary are signed by default unless they have a value in the ranges [INT_MAX+1 .. UINT_MAX], [LONG_MAX+1 .. ULONG_MAX], [LLONG_MAX+1 .. ULLONG_MAX]).
My take is i + 0xFF should have the same type as +i, ie: the type of i after integer promotion. With hex constants unsigned by default, this would not be true if i is an int as 0xFF would be a uint.
- Dec is signed by default
Agreed. That's the C rule and most compilers issue a warning for 18446744073709551615 as it becomes unsigned due to lack of a large enough signed type.
The -1 parsing solves the problem of being able to write INT_MIN without the type being promoted to long.
Consider sizeof(-2147483648) in C. This one returns 8, while sizeof(-2147483647) returns 4. Including - in parsing means that C3 can give the type of -2147483648 to be int and not long.
what is the type of the subtraction of 2 unsigned types? what is the type of the negation of an unsigned type?
The same unsigned type for both.
This is questionable: would 0x000000000 be a ulong then?
Yes.
would i + 0xFF still become unsigned
No, in C3 signed dominates over unsigned. If i is an int then we get i + (uint)0xFF after promotion. Then selecting the maximal type, which is int. After which both sides are implicitly converted to int, leaving the end result as i + (int)0xFF
Since we continue the discussion, let me reopen this.
The
-1parsing solves the problem of being able to write INT_MIN without the type being promoted to long.Consider
sizeof(-2147483648)in C. This one returns 8, whilesizeof(-2147483647)returns 4. Including - in parsing means that C3 can give the type of-2147483648to beintand notlong.
Indeed sizeof(-2147483648) is 8 whereas sizeof(-2147483647-1) is 4 in C, and this is not very intuitive, yet I would much prefer a warning on 2147483648 suggesting the use of the L suffix.
Baking the unary - into the integer literal token is opening pandora's box: here are a few examples:
- what is the type of
-0x1? - do we have
sizeof(-2147483648) != sizeof(- 2147483648)? - if you ever though of adding the exponentiation operator
**, this would make-1**2equal 1 instead of -1. - in a template, how would you parse
-nwherenis a template argument ?
what is the type of the subtraction of 2 unsigned types? what is the type of the negation of an unsigned type? The same unsigned type for both.
OK, what about mixed types ? int + uint -> int or uint ?
This is questionable: would 0x000000000 be a
ulongthen? Yes.
It looks like a hack... the L suffix is a much more readable way to specify the type: 0x0L or 0L. Btw would 0x0L be a ulong ?
would i + 0xFF still become unsigned
No, in C3 signed dominates over unsigned. If
iis anintthen we geti + (uint)0xFFafter promotion. Then selecting the maximal type, which isint. After which both sides are implicitly converted toint, leaving the end result asi + (int)0xFF
This is a major departure from the C semantics where int + uint -> uint. Nasty side effect:
uint x = 0xffffffff;
if (x - 1 > 0) {
printf("OK");
} else {
printf("not OK"); // this branch get executed if adding a signed and an unsigned evaluates to a signed.
}
Worse even: x + 0 becomes signed too :(
There is no magical solution to this semantic nightmare, but departing from subtle rules documented and learned by millions of programmers seems a bad idea. Simplicity should dictate this:
- the same semantics should apply to expressions involving literals and variables
- there should be a clear rule to determine the type of a literal
- C expression semantics should not be changed unless absolutely necessary.
- ambiguous and confusing expressions should be marked as requiring parentheses or other explicit markers (suffix, casts...)
what is the type of -0x1 ?
int
do we have sizeof(-2147483648) != sizeof(- 2147483648) ?
No.
if you ever though of adding the exponentiation operator **, this would make -1**2 equal 1 instead of -1.
I considered it very early on. It's definitely not in.
in a template, how would you parse -n where n is a template argument ?
The normal way.
OK, what about mixed types ? int + uint -> int or uint ?
int
Btw would 0x0L be a ulong ?
It would be a long
This is a major departure from the C semantics where int + uint -> uint. Nasty side effect:
Sign changing issues with uint only occur if they exceed INT_MAX. Compare this to the int + uint -> uint of C, which has issues for any negative value of int. It's just so much worse. Use U on constants when you have uint, that's the simple rule.
the same semantics should apply to expressions involving literals and variables
I used to agree, but I found a different approach: make casts signal unsafe areas. That means trying to keep casts minimal. This means also accepting unsigned <-> signed conversions, even though they are unsafe for unsigned > INT_MAX. Because unsigned values are bad as general purpose types. People who use them to say "this value can't be less than zero" are misguided, at least under C semantics. They're good for optimizing storage and bit ops, but that's about it.
Unless there is something to add to this I'll close it?