c2compiler Compiler: accept initializer lists in assignments, extend enum scoping

InitList expressions can be used in assignments, function arguments and return values.
add Expr.isAssignment() and Expr.isInitlistAssignment() to simplify tests
enum values are automatically scoped in the same places and in comparisons

Jun 08 '25 12:06 chqrlie

I lot to consider. I'm trying to come up with corner cases we can put in tests..

It would be nice to fix the TODO semi (c2_parser_expr.c2), otherwise it can remain open.

Jun 14 '25 08:06 bvdberg

This commit is probably the biggest language change so far (not code-wise, but impact-wise). I like the use when using simple structs, like point. So instead of

// old
Point p = { 2, 3 }
return p;
//new
return { 2, 3 }

For enums, there are some caveats: Color c = Red; // okay, was already possible before this patch.

c = Blue; // possible since this patch if (c == Blue) // Error: use of undeclared identifier 'Blue'. I wouldn't mind enforcing Color.Blue in both case above, since Color only appears once in the line (in contrast to the init).. Or at least, if line 1 words, line 2 should also work

setColor(Red); // now possible with this patch. Forcing prefix would not be problematic and more explicit..

Jun 15 '25 13:06 bvdberg

This commit is probably the biggest language change so far (not code-wise, but impact-wise). I like the use when using simple structs, like point. So instead of
// old
Point p = { 2, 3 }
return p;
//new
return { 2, 3 }

If we ever support compound literals, a third option might be more readable:

    return (Point){ 2, 3 }

For enums, there are some caveats:

Color c = Red; // okay, was already possible before this patch.

c = Blue; // possible since this patch,

if (c == Blue) // Error: use of undeclared identifier 'Blue'. I wouldn't mind enforcing Color.Blue in both cases above, since Color only appears once in the line (in contrast to the init).. Or at least, if line 2 works, line 3 should also work

I agree line 3 should also work, as well as if (kind >= KW_bool && kind <= KW_void)

setColor(Red); // now possible with this patch. Forcing prefix would not be problematic and more explicit...

setColor(Red) is more elegant than setColor(Color.Red) where Color is repeated and feels redundant.

If the enum type is not obvious, specifying the type makes it more readable and is of course possible: whether to write SetFontAttributes(12, Style.Bold, Color.Red) or SetFontAttributes(12, Bold, Red) is a matter of style or personal preference. Both are possible and supported.

In another case, SetFontAttributes(12, Default, Default) is less readable than SetFontAttributes(12, Style.Default, Color.Default), although both would al be accepted and correct even if Style.Default and Color.Default have a different value.

Jun 15 '25 17:06 chqrlie

Minor typo in commit msg: intiializers (makes git log nicer)

Jun 16 '25 05:06 bvdberg

Minor typo in commit msg: intiializers (makes git log nicer)

Yes, I forgot to fix that. Changed the commit line to:

Compiler: accept initializer lists in assignments, extend enum scoping

Jun 16 '25 06:06 chqrlie

I was looking through the code and in analyseBinaryOperator() there is a check if opcode == Assign. That got me thinking about the other assign (+= -= /= etc). Those currently dont work without a prefix. It is questionable whether this would be useful for enums. Enum += would be ok, but Enum += Enum-Constant would be weird.

There is also the issue of = { .. } ; where the semicolon doesn't give an error (it should).

What about?: Point p = { 1, 2 } p = { 3, 4 } = { 5, 6 }

(i pushed a change to master to get a better error location for that)

Jun 16 '25 07:06 bvdberg

I was looking through the code and in analyseBinaryOperator() there is a check if opcode == Assign. That got me thinking about the other assign (+=, -=, /= etc). Those currently dont work without a prefix. It is questionable whether this would be useful for enums. Enum += would be ok, but Enum += Enum-Constant would be weird.

I agree, I cannot think of a good use case for enum scoping for these operators. The ternary operator should support my_enum = condition ? Enum_const1 : Enum_const2; without a prefix. Arrays could be defined to take an enum value as an index and [Enum_const] would be scoped without a prefix, but we would need a specific declaration syntax and I think it would be more useful to support the C3 enum attributes instead.

There is also the issue of var = { .. } ; where the semicolon doesn't give an error (it should).

I am not sure it should give an error: The ; should be ignored as it is redundant and this should be implemented generically for expression statements by looking at the last token or using a flag in the Parser object. There are corner cases where the ; help readability.

What about?: Point p = { 1, 2 } p = { 3, 4 } = { 5, 6 }

This should give an error on the second =. But Point p1; Point p2; p1 = p2 = { 1, 2 } should work.

(i pushed a change to master to get a better error location for that)

I'll take a look.

Jun 16 '25 08:06 chqrlie

I was looking through the code and in analyseBinaryOperator() there is a check if opcode == Assign. That got me thinking about the other assign (+=, -=, /= etc). Those currently dont work without a prefix. It is questionable whether this would be useful for enums. Enum += would be ok, but Enum += Enum-Constant would be weird.

I agree, I cannot think of a good use case for enum scoping for these operators. The ternary operator should support my_enum = condition ? Enum_const1 : Enum_const2; without a prefix. Arrays could be defined to take an enum value as an index and [Enum_const] would be scoped without a prefix, but we would need a specific declaration syntax and I think it would be more useful to support the C3 enum attributes instead.

There is also the issue of var = { .. } ; where the semicolon doesn't give an error (it should).

I am not sure it should give an error: The ; should be ignored as it is redundant and this should be implemented generically for expression statements by looking at the last token or using a flag in the Parser object. There are corner cases where the ; help readability.

What about?: Point p = { 1, 2 } p = { 3, 4 } = { 5, 6 }

This should give an error on the second =. But Point p1; Point p2; p1 = p2 = { 1, 2 } should work. It complains about missing a semi-colon after } :-). We should add unit tests for those situations

(i pushed a change to master to get a better error location for that)

I'll take a look.

Jun 16 '25 08:06 bvdberg

(i pushed a change to master to get a better error location for that)

The error code is still somewhat confusing:

point.c2:32:7: error: lvalue required as left operand of assignment
    p = { 1, 2 } = { 3, 4 }
    ~~^~~~~~~~~~

The assignment the error message refers to is the second =, not the one pointed to by the ^. The grammar is incorrect: p = { 1, 2 } should be considered a complete expression that cannot participate in another assignment, nor any operator with a lesser priority. A specific test might be needed.

Here is another example:

point.c2:39:19: error: invalid operands to binary expression ('point.Point' and 'i32')
    p1 = { 1, 2 } + 1;
                  ^
point.c2:39:19: error: expression without effect
    p1 = { 1, 2 } + 1;
    ~~~~~~~~~~~~~~^~~

Even worse:

fn void test(char *s) {
    Point p1;
    p1 = { 1, 2 }
    *s = 'a';
}

Gives this surprising error:

point.c2:44:5: error: invalid operands to binary expression ('point.Point' and 'char*')
    *s = 'a';
    ^

I'll go back to the drawing board and change the way initlists are parsed in assignment expressions.

Jun 16 '25 08:06 chqrlie

I would prefer to keep patches as small as possible, since reviewing smaller ones is a lot easier... So a complete overhaul of the expr parsing would probably be too much..

Jun 16 '25 10:06 bvdberg

I would prefer to keep patches as small as possible, since reviewing smaller ones is a lot easier... So a complete overhaul of the expr parsing would probably be too much..

I just fixed these issues with minimal changes and more explicit error messages :)

Jun 16 '25 11:06 chqrlie

Many semicolon issues seems to pop up on this branch.

(global) Point p = { 1, 2 } ; // should be error

(in function) Point p = { 1, 2 }; // should be error p = {1,2 }; // should be error

Maybe you could also modify the branch in a 2nd commit, that way it's easier to see what changed for me..

The Tokenizer.charAt() is really nasty, since it breaks the clean interface of tokens... I'll see if I can find a nice solution. Update: I pushed a solution to branch initlists (on top of your branch). Update 2: it's still possible to find corner cases, maybe we need to change the rules, so that any initlist is not followed by a semi-colon and these Init-list Expr are?

return { 2, 3 };
return g_point = { 2, 3 };

or

return { 2, 3 }
return g_point = { 2, 3 }

Jun 16 '25 17:06 bvdberg

I missed your last message before pushing my latest version. Let me merge your changes and rework this.

Jun 17 '25 09:06 chqrlie

The Tokenizer.charAt() is really nasty, since it breaks the clean interface of tokens... I'll see if I can find a nice solution.

A cleaner solution is to store the kind of the last token in the Parser object and test that.

Jun 17 '25 09:06 chqrlie

The Tokenizer.charAt() is really nasty, since it breaks the clean interface of tokens... I'll see if I can find a nice solution.

A cleaner solution is to store the kind of the last token in the Parser object and test that.

yes, but that would mean storing the last one for every token... way too slow IMHO..

Jun 17 '25 09:06 bvdberg

The Tokenizer.charAt() is really nasty, since it breaks the clean interface of tokens... I'll see if I can find a nice solution.

A cleaner solution is to store the kind of the last token in the Parser object and test that.

yes, but that would mean storing the last one for every token... way too slow IMHO..

We already compute p.prev_loc = p.tok.loc + p.tok.len for every token. Storing the first 64-bits of the last token or even the full token might not be that expensive. Only testing can show one way or another.

Jun 17 '25 09:06 chqrlie

Maybe you could also modify the branch in a 2nd commit, that way it's easier to see what changed for me..

I rebased the branch on the latest master. I just pushed a patch to produce a more explicit error message for redundant trailing semicolon: if the ; appears just after the last token (RBrace if not need_semi) then the error message becomes semicolon is not accepted after initializerinstead ofexpected expression` which is somewhat puzzling.

I still think that these redundant semicolons should just be accepted and ignored though.

Jun 18 '25 05:06 chqrlie

Yes, changes in a new commit are much easier to follow for me. I think the new error message is indeed an improvement.

Language wise, i prefer to keep the variations in syntax minimal. So making semicolons optional is not good then. Either they are required or forbidden. I think allowing semicolons after all statements is ok. This does create a small difference with globals (where they are forbidden). That is the price of the init-list in expressions.

Jun 18 '25 06:06 bvdberg

Language wise, i prefer to keep the variations in syntax minimal. So making semicolons optional is not good then. Either they are required or forbidden. I think allowing semicolons after all statements is ok. This does create a small difference with globals (where they are forbidden). That is the price of the init-list in expressions.

I agree. This would definitely simplify the parser since need_semi would no longer be needed (in most places, possibly all).

Jun 18 '25 07:06 chqrlie

Language wise, i prefer to keep the variations in syntax minimal. So making semicolons optional is not good then. Either they are required or forbidden. I think allowing semicolons after all statements is ok. This does create a small difference with globals (where they are forbidden). That is the price of the init-list in expressions.

I agree. This would definitely simplify the parser since need_semi would no longer be needed (in most places, possibly all).

ok lets make it so. Shall I change it on top of your commit or will you?

Jun 18 '25 07:06 bvdberg

Language wise, i prefer to keep the variations in syntax minimal. So making semicolons optional is not good then. Either they are required or forbidden. I think allowing semicolons after all statements is ok. This does create a small difference with globals (where they are forbidden). That is the price of the init-list in expressions.

I agree. This would definitely simplify the parser since need_semi would no longer be needed (in most places, possibly all).

ok lets make it so. Shall I change it on top of your commit or will you?

I'll give it a try.

Jun 18 '25 07:06 chqrlie

ok lets make it so. Shall I change it on top of your commit or will you?

I'll give it a try.

Well this is a non trivial change: requiring the ; in statements, as opposed to struct definitions would require a double bootstrap. We would first need to accept the semicolon, then bootstrap, then we can require it and add the missing ones, then bootstrap again later (not immediately required).

Let's be more precise: in which statements would the trailing semicolon be required?

struct definitions: no
global struct/array initializers: yes/no?
incremental struct/array additions: yes/no?
local struct/array initializers: yes
struct/array assignments: yes
returning struct/array: yes

Jun 18 '25 07:06 chqrlie

I pushed a patch to make the semicolons optional so you can bootstrap.

Jun 18 '25 13:06 chqrlie

merged!

Jun 20 '25 05:06 bvdberg