zig-spec
zig-spec copied to clipboard
Update grammar.y
Currently, grammar.y
is out of sync with the actual grammar implemented by the zig
compiler (in std/zig/tokenizer
and std/zig/parser.zig
).
These are the places where grammar.y
and zig
diverges, found with the improved parser from https://github.com/ziglang/zig-spec/pull/42, using files in src
and lib/std
in the zig
repository.
-
[ ] doc-comments not allowed in top-level
comptime
andtest
declaration./// doc-comment for comptime. comptime { var a = 1; _ = a; } /// doc-comment for test. test {}
-
[ ] Saturating arithmetic is not supported by
grammar.y
.test { var a: isize = 10; a +|= 1; a -|= 1; a *|= 1; a <<|= 1; const b: isize = a +| 1; const c: isize = b -| 1; const d: isize = c *| 1; const e: isize = d <<| 1; _ = e; }
-
[ ] Mixed
doc-comment
andline-comment
is not supported bygrammar.y
.// A doc-comment followed by a line-comment is not supported by grammar.y. const S = struct { //! doc /// doc // doc a: i32, };
NOTE: I found mixing
doc-comment
andline-comment
confusing, andautodoc
doesn't not handle them correctly.Examples:
-
std/mem.zig:3760
/// Force an evaluation of the expression; this tries to prevent /// the compiler from optimizing the computation away even if the /// result eventually gets discarded. // TODO: use @declareSideEffect() when it is available - https://github.com/ziglang/zig/issues/6168 pub fn doNotOptimizeAway(val: anytype) void {
See: https://ziglang.org/documentation/master/std/#root;mem.doNotOptimizeAway.
-
std/coff.zig:354
/// This relocation is meaningful only when the machine type is ARM or Thumb. /// The base relocation applies the 32-bit address of a symbol across a consecutive MOVW/MOVT instruction pair. // ARM_MOV32 = 5, /// This relocation is only meaningful when the machine type is RISC-V. /// The base relocation applies to the high 20 bits of a 32-bit absolute address. // RISCV_HIGH20 = 5, /// Reserved, must be zero. RESERVED = 6,
See https://ziglang.org/documentation/master/std/#root;coff.BaseRelocationType.
-
-
[ ] New
addrspace
keyword.Commit: ziglang/zig@ccc7f9987 (Address spaces: addrspace(A) parsing) Date: 2021-09-14
test { const y: *allowzero align(8) addrspace(.generic) const volatile u32 = undefined; _ = y; }
-
[ ] Inline switch prong not supported by
grammar.y
.Commit: ziglang/zig@b4d81857f (stage1+2: parse inline switch cases) Date: 2022-02-13
const std = @import("std"); const expect = std.testing.expect; const SliceTypeA = extern struct { len: usize, ptr: [*]u32, }; const SliceTypeB = extern struct { ptr: [*]SliceTypeA, len: usize, }; const AnySlice = union(enum) { a: SliceTypeA, b: SliceTypeB, c: []const u8, d: []AnySlice, }; fn withSwitch(any: AnySlice) usize { return switch (any) { // With `inline else` the function is explicitly generated // as the desired switch and the compiler can check that // every possible case is handled. inline else => |slice| slice.len, }; } test "inline else" { var any = AnySlice{ .c = "hello" }; try expect(withSwitch(any) == 5); }
-
[ ] New packed
struct
syntax.Commit: ziglang/zig@6249a24e8 (stage2: integer-backed packed structs) Date: 2022-02-23
pub const AbsolutePointerModeAttributes = packed struct(u32) { supports_alt_active: bool, supports_pressure_as_z: bool, _pad: u30 = 0, };
@Vexu, the grammar still fails with some source file in the zig
project.
Using the script https://gist.github.com/perillo/a1a731d234b425597a70412a7343b772, these are the errors:
compiler source
./parse.sh ../../zig/src/**/*.zig
:
parsing 128 files...
FAIL: ../../zig/src/arch/wasm/CodeGen.zig: 1
<stdin>:661: syntax error
///
FAIL: ../../zig/src/link/SpirV.zig: 1
<stdin>:19: syntax error
//
std source
./parse.sh ../../zig/lib/std/**/*.zig
:
parsing 479 files...
FAIL: ../../zig/lib/std/coff.zig: 1
<stdin>:356: syntax error
///
FAIL: ../../zig/lib/std/comptime_string_map.zig: 1
<stdin>:99: syntax error
const KV = struct { [
FAIL: ../../zig/lib/std/fmt/parse_float/convert_hex.zig: 1
<stdin>:3: syntax error
FAIL: ../../zig/lib/std/http/method.zig: 1
<stdin>:4: syntax error
//
FAIL: ../../zig/lib/std/meta.zig: 1
<stdin>:118: syntax error
const EnumKV = struct { [
In the previous list I forgot to add the new tuple type declaration.
- [ ] New tuple type declaration
Commit: ziglang/zig@30eb2a175
const KV = struct { []const u8, isize };
Thanks.
Done in 3aefd846ffb1dae997c6a715f81b4003a9072364
@Vexu, found more errors when checking the zig
test suite:
Compiler tests
./check_parser.sh ../../zig/test/**/*.zig
:
running 1456 tests...
FAIL: ../../zig/test/behavior/decltest.zig: zig: 0, grammar: 1
<stdin>:5: syntax error
test t
FAIL: ../../zig/test/behavior/tuple_declarations.zig: zig: 0, grammar: 1
<stdin>:13: syntax error
const T = struct { comptime u32 a
FAIL: ../../zig/test/cases/compile_errors/Issue_6823_dont_allow_._to_be_followed_by_.zig: zig: 1, grammar: 0
FAIL: ../../zig/test/cases/compile_errors/alignment_of_enum_field_specified.zig: zig: 0, grammar: 1
<stdin>:3: syntax error
b a
FAIL: ../../zig/test/cases/compile_errors/attempted_double_ampersand.zig: zig: 1, grammar: 0
FAIL: ../../zig/test/cases/compile_errors/chained_comparison_operators.zig: zig: 1, grammar: 0
FAIL: ../../zig/test/cases/compile_errors/invalid_empty_unicode_escape.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
const a = '\u{}
FAIL: ../../zig/test/cases/compile_errors/invalid_exponent_in_float_literal-1.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
var bad: f128 = 0x1.0p1ab
FAIL: ../../zig/test/cases/compile_errors/invalid_exponent_in_float_literal-2.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
var bad: f128 = 0x1.0p50F
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-10.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
var bad: f128 = 1.0__
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-11.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
var bad: f128 = 1.0e-1__
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-12.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
var bad: f128 = 1_x
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-13.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
var bad: f128 = 0x_
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-14.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
var bad: f128 = 0x0.0_p
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-2.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
var bad: f128 = 1_.
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-3.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
var bad: f128 = 0.0_;
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-4.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
var bad: f128 = 1.0e_
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-5.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
var bad: f128 = 1.0e+_
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-6.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
var bad: f128 = 1.0e-_
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-7.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
var bad: f128 = 1.0e-1_;
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-9.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
var bad: f128 = 1__
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_int_literal-1.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
var bad: u128 = 10_;
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_int_literal-2.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
var bad: u128 = 0b0010_;
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_int_literal-3.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
var bad: u128 = 0o0010_;
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_int_literal-4.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
var bad: u128 = 0x0010_;
FAIL: ../../zig/test/cases/compile_errors/missing_digit_after_base.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
const x = @as(usize, -0x)
FAIL: ../../zig/test/cases/compile_errors/reject_extern_function_definitions_with_body.zig: zig: 1, grammar: 0
FAIL: ../../zig/test/cases/double_ampersand.0.zig: zig: 1, grammar: 0
Most of the errors are invalid literals; however an interesting case is:
pub fn the_add_function(a: u32, b: u32) u32 {
return a + b;
}
test the_add_function {
if (the_add_function(1, 2) != 3) unreachable;
}
decltest
was added in ziglang/zig@3bbe6a28e.
Thanks
Done in 11ed032665fca9c78144c11993cfe229d21a5bc0, I'm not sure whether this grammar should allow valid numbers or not so I didn't touch them.
In addition to number literals, the zig parser seems to be able to do additional analysis compared to grammar.y
, in order to handle ambiguous cases.
./check_parser.sh ../../zig/test/**/*.zig | grep "zig: 1"
:
FAIL: ../../zig/test/cases/compile_errors/Issue_6823_dont_allow_._to_be_followed_by_.zig: zig: 1, grammar: 0
var sequence = "repeat".*** 10;
FAIL: ../../zig/test/cases/compile_errors/attempted_double_ampersand.zig: zig: 1, grammar: 0
if (a && b) {
return 1234;
}
FAIL: ../../zig/test/cases/compile_errors/chained_comparison_operators.zig: zig: 1, grammar: 0
return 1 < value < 1000
FAIL: ../../zig/test/cases/compile_errors/reject_extern_function_definitions_with_body.zig: zig: 1, grammar: 0
extern "c" fn definitelyNotInLibC(a: i32, b: i32) i32 {
return a + b;
}
FAIL: ../../zig/test/cases/double_ampersand.0.zig: zig: 1, grammar: 0
pub const a = if (true && false) 1 else 2;
Those shouldn't be in the grammar IMO.
Do you think they should be documented in the Zig Language Reference?
I don't think it's strictly necessary but if you want to mention it somewhere then go ahead.
@Vexu, in order to check the PEG parser I wrote a tool (in Go, since it was more convenient) that downloads up to 1000 repositories using the GitHub API, and check each .zig
file with the check_parser.sh
script.
For now I found these issues:
-
[ ] A Windows style line ending is not supported
const std = @import("std")^M
https://github.com/ziglibs/zig-lv2/blob/master/examples/fifths/fifths.zig
check https://github.com/ziglibs/zig-lv2/blob/main/examples/fifths/fifths.zig : error: exit status 1 running 1 tests... FAIL: /data/cache/zig-package-index/346899503/examples/fifths/fifths.zig: zig: 0, grammar: 1 <stdin>:1: syntax error const std = @import("std");^M ^
-
[ ] Official grammar does not support the UTF-8 BOM
https://github.com/MasterQ32/zero-graphics/blob/master/src/gl_es_2v0.zig
check https://github.com/MasterQ32/zero-graphics/blob/master/src/gl_es_2v0.zig : error: exit status 1 running 1 tests... FAIL: /data/cache/zig-package-index/368337113/src/gl_es_2v0.zig: zig: 0, grammar: 1 <stdin>:1: syntax error <feff>// ^
NOTE:
<feff>
is Vim bomb. -
[ ] Alignment using tabs is not supported
test { var window = try capy.Window.init(); try window.set(capy.Column(.{}, .{ capy.Label(.{ .text = "Balls with attraction and friction" }), capy.Label(.{ }) .bind("text", totalEnergyFormat), capy.Align(.{}, &canvas), })); }
There are two tabs before
capy.Align(.{}, &canvas),
https://github.com/capy-ui/capy/blob/master/examples/balls.zig
check https://github.com/capy-ui/capy/blob/master/examples/balls.zig : error: exit status 1 running 1 tests... FAIL: /data/cache/zig-package-index/351056766/examples/balls.zig: zig: 0, grammar: 1 <stdin>:58: syntax error capy.Align(.{}, &canvas), ^
-
[ ] PEG parser crash
https://github.com/marlersoft/zigwin32/blob/master/win32/everything.zig
check marlersoft/zigwin32:everything.zig: error: exit status 1 running 1 tests... FAIL: /data/cache/zig-package-index/333504729/win32/everything.zig: zig: 0, grammar: 139
The culprit is probably the source file size: 20.6 MB
Found other issues:
-
[ ] Not sure where is the problem
pub const DmaTuple = struct { DmaController(0), DmaController(1), DmaController(2), DmaController(3) };
https://github.com/paoda/zba/blob/main/src/core/bus/dma.zig
check https://github.com/paoda/zba/blob/main/src/core/bus/dma.zig : error: exit status 1 running 1 tests... FAIL: /data/cache/zig-package-index/464707106/src/core/bus/dma.zig: zig: 0, grammar: 1 <stdin>:8: syntax error pub const DmaTuple = struct { DmaController(0), DmaController(1), DmaController(2), DmaController(3) }; ^
-
[ ] Official grammar does not report an error for "binary operator
xxx
has whitespace on one side, but not the other"const a = 1 +2;
https://github.com/wapc/wapc-guest-zig/blob/master/wapc.zig
check https://github.com/wapc/wapc-guest-zig/blob/master/wapc.zig : error: exit status 1 running 1 tests... FAIL: /data/cache/zig-package-index/207880856/wapc.zig: zig: 1, grammar: 0
Not sure if this is responsibility of the PEG grammar.
Here is the full report: https://gist.github.com/perillo/c33e79e6d95efabaf302ef3c5d5907ad
Thank you for taking care of the grammar. A few observations:
- I see that
ContainerDeclarations
is declared with right-hand recursion and an additional empty rule. Would it be more readable to refactor the rule and replace it with*
, like this?
diff --git a/grammar/grammar.y b/grammar/grammar.y
index ec08094..c5c47be 100644
--- a/grammar/grammar.y
+++ b/grammar/grammar.y
@@ -1,13 +1,12 @@
Root <- skip container_doc_comment? ContainerMembers eof
# *** Top level ***
-ContainerMembers <- ContainerDeclarations (ContainerField COMMA)* (ContainerField / ContainerDeclarations)
+ContainerMembers <- ContainerDeclaration* (ContainerField COMMA)* (ContainerField / ContainerDeclaration*)
-ContainerDeclarations
- <- TestDecl ContainerDeclarations
- / ComptimeDecl ContainerDeclarations
- / doc_comment? KEYWORD_pub? Decl ContainerDeclarations
- /
+ContainerDeclaration
+ <- TestDecl
+ / ComptimeDecl
+ / doc_comment? KEYWORD_pub? Decl
-
ascii_char_not_nl_slash_squote
rule specifies a weird range (see multiple dashes)[\000-\011\013-\046-\050-\133\135-\177]
- could it be clarified?
I see that ContainerDeclarations is declared with right-hand recursion and an additional empty rule. Would it be more readable to refactor the rule and replace it with
*
Seconded. Also IMHO container_doc_comment?
should be part of ContainerMembers so the container declaration expression needs not repeat it:
ContainerDeclAuto <- ContainerDeclType LBRACE container_doc_comment? ContainerMembers RBRACE
ascii_char_not_nl_slash_squote
rule specifies a weird range (see multiple dashes)[\000-\011\013-\046-\050-\133\135-\177]
- could it be clarified?
It skips \012
(newline), \047
(single quote) and \134
(backslash). Perhaps the name should be clarified that it's backslash instead of slash and sort them in order? BTW the dash in the middle was a typo and fixed in GH-52.