Compiler: accept initializer lists in assignments, extend enum scoping
InitListexpressions can be used in assignments, function arguments and return values.- add
Expr.isAssignment()andExpr.isInitlistAssignment()to simplify tests - enum values are automatically scoped in the same places and in comparisons
I lot to consider. I'm trying to come up with corner cases we can put in tests..
It would be nice to fix the TODO semi (c2_parser_expr.c2), otherwise it can remain open.
This commit is probably the biggest language change so far (not code-wise, but impact-wise). I like the use when using simple structs, like point. So instead of
// old
Point p = { 2, 3 }
return p;
//new
return { 2, 3 }
For enums, there are some caveats: Color c = Red; // okay, was already possible before this patch.
c = Blue; // possible since this patch if (c == Blue) // Error: use of undeclared identifier 'Blue'. I wouldn't mind enforcing Color.Blue in both case above, since Color only appears once in the line (in contrast to the init).. Or at least, if line 1 words, line 2 should also work
setColor(Red); // now possible with this patch. Forcing prefix would not be problematic and more explicit..
This commit is probably the biggest language change so far (not code-wise, but impact-wise). I like the use when using simple structs, like point. So instead of
// old Point p = { 2, 3 } return p; //new return { 2, 3 }
If we ever support compound literals, a third option might be more readable:
return (Point){ 2, 3 }
For enums, there are some caveats:
Color c = Red; // okay, was already possible before this patch.c = Blue;// possible since this patch,if (c == Blue) // Error: use of undeclared identifier 'Blue'.I wouldn't mind enforcingColor.Bluein both cases above, sinceColoronly appears once in the line (in contrast to theinit).. Or at least, if line 2 works, line 3 should also work
I agree line 3 should also work, as well as if (kind >= KW_bool && kind <= KW_void)
setColor(Red); // now possible with this patch. Forcing prefix would not be problematic and more explicit...
setColor(Red) is more elegant than setColor(Color.Red) where Color is repeated and feels redundant.
If the enum type is not obvious, specifying the type makes it more readable and is of course possible: whether to write SetFontAttributes(12, Style.Bold, Color.Red) or SetFontAttributes(12, Bold, Red) is a matter of style or personal preference. Both are possible and supported.
In another case, SetFontAttributes(12, Default, Default) is less readable than SetFontAttributes(12, Style.Default, Color.Default), although both would al be accepted and correct even if Style.Default and Color.Default have a different value.
Minor typo in commit msg: intiializers (makes git log nicer)
Minor typo in commit msg: intiializers (makes git log nicer)
Yes, I forgot to fix that. Changed the commit line to:
Compiler: accept initializer lists in assignments, extend enum scoping
I was looking through the code and in analyseBinaryOperator() there is a check if opcode == Assign. That got me thinking about the other assign (+= -= /= etc). Those currently dont work without a prefix. It is questionable whether this would be useful for enums. Enum +=
There is also the issue of = { .. } ; where the semicolon doesn't give an error (it should).
What about?: Point p = { 1, 2 } p = { 3, 4 } = { 5, 6 }
(i pushed a change to master to get a better error location for that)
I was looking through the code and in
analyseBinaryOperator()there is a check ifopcode == Assign. That got me thinking about the other assign (+=,-=,/=etc). Those currently dont work without a prefix. It is questionable whether this would be useful for enums. Enum += would be ok, but Enum += Enum-Constant would be weird.
I agree, I cannot think of a good use case for enum scoping for these operators.
The ternary operator should support my_enum = condition ? Enum_const1 : Enum_const2; without a prefix.
Arrays could be defined to take an enum value as an index and [Enum_const] would be scoped without a prefix, but we would need a specific declaration syntax and I think it would be more useful to support the C3 enum attributes instead.
There is also the issue of
var = { .. } ;where the semicolon doesn't give an error (it should).
I am not sure it should give an error: The ; should be ignored as it is redundant and this should be implemented generically for expression statements by looking at the last token or using a flag in the Parser object. There are corner cases where the ; help readability.
What about?:
Point p = { 1, 2 } p = { 3, 4 } = { 5, 6 }
This should give an error on the second =. But Point p1; Point p2; p1 = p2 = { 1, 2 } should work.
(i pushed a change to master to get a better error location for that)
I'll take a look.
I was looking through the code and in
analyseBinaryOperator()there is a check ifopcode == Assign. That got me thinking about the other assign (+=,-=,/=etc). Those currently dont work without a prefix. It is questionable whether this would be useful for enums. Enum += would be ok, but Enum += Enum-Constant would be weird.I agree, I cannot think of a good use case for enum scoping for these operators. The ternary operator should support
my_enum = condition ? Enum_const1 : Enum_const2;without a prefix. Arrays could be defined to take an enum value as an index and[Enum_const]would be scoped without a prefix, but we would need a specific declaration syntax and I think it would be more useful to support the C3 enum attributes instead.There is also the issue of
var = { .. } ;where the semicolon doesn't give an error (it should).I am not sure it should give an error: The
;should be ignored as it is redundant and this should be implemented generically for expression statements by looking at the last token or using a flag in theParserobject. There are corner cases where the;help readability.What about?:
Point p = { 1, 2 } p = { 3, 4 } = { 5, 6 }This should give an error on the second
=. ButPoint p1; Point p2; p1 = p2 = { 1, 2 }should work. It complains about missing a semi-colon after } :-). We should add unit tests for those situations(i pushed a change to master to get a better error location for that)
I'll take a look.
(i pushed a change to master to get a better error location for that)
The error code is still somewhat confusing:
point.c2:32:7: error: lvalue required as left operand of assignment
p = { 1, 2 } = { 3, 4 }
~~^~~~~~~~~~
The assignment the error message refers to is the second =, not the one pointed to by the ^. The grammar is incorrect: p = { 1, 2 } should be considered a complete expression that cannot participate in another assignment, nor any operator with a lesser priority. A specific test might be needed.
Here is another example:
point.c2:39:19: error: invalid operands to binary expression ('point.Point' and 'i32')
p1 = { 1, 2 } + 1;
^
point.c2:39:19: error: expression without effect
p1 = { 1, 2 } + 1;
~~~~~~~~~~~~~~^~~
Even worse:
fn void test(char *s) {
Point p1;
p1 = { 1, 2 }
*s = 'a';
}
Gives this surprising error:
point.c2:44:5: error: invalid operands to binary expression ('point.Point' and 'char*')
*s = 'a';
^
I'll go back to the drawing board and change the way initlists are parsed in assignment expressions.
I would prefer to keep patches as small as possible, since reviewing smaller ones is a lot easier... So a complete overhaul of the expr parsing would probably be too much..
I would prefer to keep patches as small as possible, since reviewing smaller ones is a lot easier... So a complete overhaul of the expr parsing would probably be too much..
I just fixed these issues with minimal changes and more explicit error messages :)
Many semicolon issues seems to pop up on this branch.
(global) Point p = { 1, 2 } ; // should be error
(in function) Point p = { 1, 2 }; // should be error p = {1,2 }; // should be error
Maybe you could also modify the branch in a 2nd commit, that way it's easier to see what changed for me..
The Tokenizer.charAt() is really nasty, since it breaks the clean interface of tokens... I'll see if I can find a nice solution. Update: I pushed a solution to branch initlists (on top of your branch). Update 2: it's still possible to find corner cases, maybe we need to change the rules, so that any initlist is not followed by a semi-colon and these Init-list Expr are?
return { 2, 3 };
return g_point = { 2, 3 };
or
return { 2, 3 }
return g_point = { 2, 3 }
I missed your last message before pushing my latest version. Let me merge your changes and rework this.
The Tokenizer.charAt() is really nasty, since it breaks the clean interface of tokens... I'll see if I can find a nice solution.
A cleaner solution is to store the kind of the last token in the Parser object and test that.
The Tokenizer.charAt() is really nasty, since it breaks the clean interface of tokens... I'll see if I can find a nice solution.
A cleaner solution is to store the kind of the last token in the Parser object and test that.
yes, but that would mean storing the last one for every token... way too slow IMHO..
The Tokenizer.charAt() is really nasty, since it breaks the clean interface of tokens... I'll see if I can find a nice solution.
A cleaner solution is to store the kind of the last token in the Parser object and test that.
yes, but that would mean storing the last one for every token... way too slow IMHO..
We already compute p.prev_loc = p.tok.loc + p.tok.len for every token. Storing the first 64-bits of the last token or even the full token might not be that expensive. Only testing can show one way or another.
Maybe you could also modify the branch in a 2nd commit, that way it's easier to see what changed for me..
I rebased the branch on the latest master.
I just pushed a patch to produce a more explicit error message for redundant trailing semicolon: if the ; appears just after the last token (RBrace if not need_semi) then the error message becomes semicolon is not accepted after initializerinstead ofexpected expression` which is somewhat puzzling.
I still think that these redundant semicolons should just be accepted and ignored though.
Yes, changes in a new commit are much easier to follow for me. I think the new error message is indeed an improvement.
Language wise, i prefer to keep the variations in syntax minimal. So making semicolons optional is not good then. Either they are required or forbidden. I think allowing semicolons after all statements is ok. This does create a small difference with globals (where they are forbidden). That is the price of the init-list in expressions.
Language wise, i prefer to keep the variations in syntax minimal. So making semicolons optional is not good then. Either they are required or forbidden. I think allowing semicolons after all statements is ok. This does create a small difference with globals (where they are forbidden). That is the price of the init-list in expressions.
I agree. This would definitely simplify the parser since need_semi would no longer be needed (in most places, possibly all).
Language wise, i prefer to keep the variations in syntax minimal. So making semicolons optional is not good then. Either they are required or forbidden. I think allowing semicolons after all statements is ok. This does create a small difference with globals (where they are forbidden). That is the price of the init-list in expressions.
I agree. This would definitely simplify the parser since
need_semiwould no longer be needed (in most places, possibly all).
ok lets make it so. Shall I change it on top of your commit or will you?
Language wise, i prefer to keep the variations in syntax minimal. So making semicolons optional is not good then. Either they are required or forbidden. I think allowing semicolons after all statements is ok. This does create a small difference with globals (where they are forbidden). That is the price of the init-list in expressions.
I agree. This would definitely simplify the parser since
need_semiwould no longer be needed (in most places, possibly all).ok lets make it so. Shall I change it on top of your commit or will you?
I'll give it a try.
ok lets make it so. Shall I change it on top of your commit or will you?
I'll give it a try.
Well this is a non trivial change: requiring the ; in statements, as opposed to struct definitions would require a double bootstrap. We would first need to accept the semicolon, then bootstrap, then we can require it and add the missing ones, then bootstrap again later (not immediately required).
Let's be more precise: in which statements would the trailing semicolon be required?
structdefinitions: no- global struct/array initializers: yes/no?
- incremental struct/array additions: yes/no?
- local struct/array initializers: yes
- struct/array assignments: yes
- returning struct/array: yes
I pushed a patch to make the semicolons optional so you can bootstrap.
merged!