Fix array compound literal parsing
Summary
Fix a segfault when evaluating array compound literals like (int[]){1,2,3,4,5} in expression context. The literal now yields the temporary array’s address (via decay) instead of collapsing to a single element, so indexing and reads work correctly.
Motivation Previously, code like the snippet below crashed because the array literal was reduced to a scalar and later treated as a pointer.
Reproduction (manual)
#include <stdio.h>
int main(void) {
int *a = (int[]){1,2,3,4,5}; // used to crash later
printf("Testing basic array compound literal:\n");
for (int i = 0; i < 5; i++)
printf(" a[%d] = %d\n", i, a[i]);
int sum = a[0] + a[4];
printf("Sum = %d (expect 6)\n", sum);
return 0;
}
Before: segfault
After: prints a[0..4] and Sum = 6 as expected
Approach (high level)
- Ensure
(type[]){…}produces a real temporary array object and the expression value decays to its address. - Only collapse to a scalar when a scalar is actually required by context.
- No changes to struct compound literals or plain scalars.
Scope
- This PR addresses the array side of #299 sufficient to eliminate the segfault in basic use.
- Struct compound literals and broader initializer behaviors are intentionally out of scope here.
Tests
- No test files are added in this PR. The manual snippet above demonstrates the fix.
Compatibility
- Restores standard pointer semantics for array compound literals.
- No behavioral changes for structs or scalars.
Issue
- Partially addresses #299 (array compound literals).
Summary by cubic
Fix parsing of array compound literals so they allocate a temporary array and decay to its address, preserving pointer semantics and preventing segfaults.
-
Bug Fixes
- Eliminates crashes when using (int[]){...} in expressions, assignments, and parameter passing.
- Correct behavior across binary ops, pointer arithmetic, ternary results, and function-call (incl. variadic) arguments.
- Zero-length array literals yield 0 for scalar uses to avoid garbage reads.
- Addresses #299.
-
Refactors
- Added parse_array_compound_literal and scalarize_array_literal to centralize handling; scalarize only when a scalar is required.
- Emits element writes and counts initializers; tracks brace consumption to avoid double reads.
- Struct and scalar compound literals remain unchanged.
Written for commit a4aba549e0f1755ad4105ffb4fc503a77922a17b. Summary will update automatically on new commits.
You MUST ensure bootstrapping is fully functional before submitting pull requests.
IIUC, compound literals are a feature supported only since C99, and the shecc README mentions from the very beginning that this project aims to support ANSI C. Therefore, IMO, this at least does not "fix" anything.
IIUC, compound literals are a feature supported only since C99, and the shecc README mentions from the very beginning that this project aims to support ANSI C. Therefore, IMO, this at least does not "fix" anything.
As far as I know, shecc is planned to fully support the C99 standard, so I think the term "fix" is acceptable.
Fine, but since we haven't claimed to fully support C99 and, IIUC, array compound literals were never supported before, this seems more like supporting a new feature to me, rather than fixing an existing problem.
In fact, shecc has ability to handle array compound literals, but it only captures the first element (#299).
Therefore, this pull request specifically aims to fix it.
In fact, shecc has ability to handle array compound literals, but it only captures the first element (#299).
Therefore, this pull request specifically aims to fix it.
Thanks, that resolves my doubt. However, should we also mention the issue in the first commit?
Looking at the changes, the core logic for handling array compound literals looks solid—good job centralizing the decay behavior to avoid ad-hoc fixes everywhere. But yeah, there are some style inconsistencies and comment opportunities that could make this even tighter. Here's what I'd suggest, focused on parser.c (the bulk of the changes) and the test additions. I'll keep it practical: specific line tweaks for consistency with the rest of the codebase (e.g., brace placement, naming, spacing) and punchier comments.
Style Fixes for parser.c
-
Brace placement and indentation in new functions:
- In
parse_array_compound_literal, theif (!lex_peek(T_close_curly, NULL))block has inconsistent indentation for the innerforloop—align it to match the function's style (2-space indents). Also, add braces around single-statement blocks for consistency (e.g., thecount++line).
This matches the style inif (!lex_peek(T_close_curly, NULL)) { for (;;) { read_expr(parent, bb); read_ternary_operation(parent, bb); var_t *value = opstack_pop(); if (count == 0) var->init_val = value->init_val; if (emit_code) { // ... (existing code) } count++; // Add braces if expanding later if (!lex_accept(T_comma)) break; if (lex_peek(T_close_curly, NULL)) break; } }read_exprand avoids future bugs.
- In
-
Variable naming consistency:
vdis overused as a generic var_t pointer—rename to something descriptive inread_ternary_operation(e.g.,result_varfor the final vd). It's clear in context, butvdfeels like a holdover from declarations elsewhere.var_t *result_var = require_var(parent); gen_name_to(result_var->var_name); add_insn(parent, then_, OP_assign, result_var, true_val, NULL, 0, NULL); add_insn(parent, else_, OP_assign, result_var, false_val, NULL, 0, NULL); // ... result_var->is_ternary_ret = true; opstack_push(result_var);- Similarly, in
read_body_assignment,rhs_valandrhsare redundant—userhsconsistently and drop the intermediate.
-
Spacing and operator alignment:
- In
is_array_literal_placeholder, tighten the return condition:return var && var->array_size > 0 && !var->ptr_level && var->var_name[0] == '.'; - In binary op handling (around line 2908), the new
rs1_is_ptr_likechecks have good logic but uneven spacing—align the bool defs:bool rs1_is_ptr_like = rs1 && (rs1->ptr_level || rs1->array_size); bool rs2_is_ptr_like = rs2 && (rs2->ptr_level || rs2->array_size); - Remove extra blank lines after function defs (e.g., after
scalarize_array_literal) to match the compact style in the original parser.
- In
-
Minor: Unused params and locals:
- In
scalarize_array_literal,hint_typedefaults toTY_intif invalid, but the null check forelem_typecould be simplified:type_t *elem_type = hint_type ? hint_type : array_var->type ? array_var->type : TY_int;. Drops the nested if. var->init_val = 0;in a few spots (e.g., zero-length handling) is fine, but ensure it's set consistently—it's already good, just a nit.
- In
Comment Improvements
The inline comments are mostly helpful, but some are wordy or could be more precise. Aim for concise, factual notes that explain why not just what.
-
Update the placeholder comment (around line 1364):
- Current: "Identify compiler-emitted temporaries that hold array compound literals. Parsing assigns these temporaries synthetic names via gen_name_to (".tN") and they keep array metadata without pointer indirection."
- Suggested: "Detect compiler temporaries ('.tN' names) holding array compound literals. These retain array metadata (size/type) without decaying to pointers until needed."
- Shorter, focuses on the key invariant (no auto-decay).
-
Add a brief header comment to
parse_array_compound_literal:/* Parse array compound literal contents: emit element stores to a temporary array. * Counts initializers and consumes the closing brace. Used only for array types. */ void parse_array_compound_literal(var_t *var, /* ... */)- Explains purpose without repeating the PR description.
-
Clarify scalarization calls:
- In function params (around line 1650), add:
// Decay array literals to scalars for non-pointer params, per C semantics. - In ternary handling:
// Scalarize array literals if the other branch isn't pointer-like, to match types. - In assignment:
// Decay RHS array literal to scalar if target is non-pointer (e.g., int x = (int[]){1}; uses first elem).- These tie back to the motivation (avoiding scalar collapse in pointer contexts) without verbosity.
- In function params (around line 1650), add:
-
Zero-length handling (in read_expr_operand):
- Current implicit via
if (compound_var->array_size == 0). Add:// Zero-length arrays default to 0 constant (avoids garbage reads in scalar contexts).
- Current implicit via
For Tests in driver.sh
- The new tests are great for coverage, but add a comment block at the top of the array sections:
# Array compound literal tests: verify decay to pointer in assignments and args - In the sum test, the
0, 0padding is clever for partial init, but add an expected output comment:# Expects sum=6 from first 3 elements. - Consistent with existing tests: Use
try_ 0everywhere (you did), and keep EOF indents clean.
These tweaks keep the code readable and maintainable without overhauling anything. Total changes: ~20 lines. Run a style check (assuming you have one) after, and it'll align nicely with the rest of the parser. If you push an updated commit, I can re-review the diffs.
I defer to @ChAoSUnItY for confirmation.
@lorettayao You can reply to or chat with @cubic-dev-ai so the AI bot can learn from your feedback and provide smarter code reviews over time.
The CI error is caused by IR regression test, if you happened to have this test failed, you can execute make update-snapshots to update the IR snapshots, then push the updated snapshots so that the test would proceed.
The CI error is caused by IR regression test, if you happened to have this test failed, you can execute
make update-snapshotsto update the IR snapshots, then push the updated snapshots so that the test would proceed.
@ChAoSUnItY When i update-snapshots, there are only the fib-riscv.json & hello-riscv.json been updated, is that normal? I wonder if the CI fail is because I did not update fib-arm.json and hello-arm.json.
there are only the fib-riscv.json & hello-riscv.json been updated, is that normal?
Sometimes some changes only affect certain backend, which is normal. If you're not sure if this is correct, you can perform make check-snapshots to see the result first on local machine, if it succeeds, then it should be correct on CI, too.
@jserv I have rebased to master, and I saw my branch is up to date with 'origin/master'. I am confused if the conflict is because of the rebase issue.
I guess your origin points to your own repository on github instead of the upstream repository under sysprog21?
I guess your origin points to your own repository on github instead of the upstream repository under sysprog21?
yes! Thank you for pointing out!
@lorettayao, note that the commit message body should be wrapped at 72 characters.