zig-spec icon indicating copy to clipboard operation
zig-spec copied to clipboard

Update grammar.y

Open perillo opened this issue 2 years ago • 12 comments

Currently, grammar.y is out of sync with the actual grammar implemented by the zig compiler (in std/zig/tokenizer and std/zig/parser.zig).

These are the places where grammar.y and zig diverges, found with the improved parser from https://github.com/ziglang/zig-spec/pull/42, using files in src and lib/std in the zig repository.

  • [ ] doc-comments not allowed in top-level comptime and test declaration.

    /// doc-comment for comptime.
    comptime {
        var a = 1;
        _ = a;
    }
    
    /// doc-comment for test.
    test {}
    
  • [ ] Saturating arithmetic is not supported by grammar.y.

    test {
        var a: isize = 10;
    
        a +|= 1;
        a -|= 1;
        a *|= 1;
        a <<|= 1;
    
        const b: isize = a +| 1;
        const c: isize = b -| 1;
        const d: isize = c *| 1;
        const e: isize = d <<| 1;
        _ = e;
    }
    
  • [ ] Mixed doc-comment and line-comment is not supported by grammar.y.

    // A doc-comment followed by a line-comment is not supported by grammar.y.
    const S = struct {
        //! doc
        /// doc
        // doc
        a: i32,
    };
    

    NOTE: I found mixing doc-comment and line-comment confusing, and autodoc doesn't not handle them correctly.

    Examples:

    • std/mem.zig:3760

      /// Force an evaluation of the expression; this tries to prevent
      /// the compiler from optimizing the computation away even if the
      /// result eventually gets discarded.
      // TODO: use @declareSideEffect() when it is available - https://github.com/ziglang/zig/issues/6168
      pub fn doNotOptimizeAway(val: anytype) void {
      

      See: https://ziglang.org/documentation/master/std/#root;mem.doNotOptimizeAway.

    • std/coff.zig:354

      /// This relocation is meaningful only when the machine type is ARM or Thumb.
      /// The base relocation applies the 32-bit address of a symbol across a consecutive MOVW/MOVT instruction pair.
      // ARM_MOV32 = 5,
      
      /// This relocation is only meaningful when the machine type is RISC-V.
      /// The base relocation applies to the high 20 bits of a 32-bit absolute address.
      // RISCV_HIGH20 = 5,
      
      /// Reserved, must be zero.
      RESERVED = 6,
      

      See https://ziglang.org/documentation/master/std/#root;coff.BaseRelocationType.

  • [ ] New addrspace keyword.

    Commit: ziglang/zig@ccc7f9987 (Address spaces: addrspace(A) parsing) Date: 2021-09-14

    test {
        const y: *allowzero align(8) addrspace(.generic) const volatile u32 = undefined;
        _ = y;
    }
    
  • [ ] Inline switch prong not supported by grammar.y.

    Commit: ziglang/zig@b4d81857f (stage1+2: parse inline switch cases) Date: 2022-02-13

    const std = @import("std");
    const expect = std.testing.expect;
    
    const SliceTypeA = extern struct {
        len: usize,
        ptr: [*]u32,
    };
    const SliceTypeB = extern struct {
        ptr: [*]SliceTypeA,
        len: usize,
    };
    const AnySlice = union(enum) {
        a: SliceTypeA,
        b: SliceTypeB,
        c: []const u8,
        d: []AnySlice,
    };
    
    fn withSwitch(any: AnySlice) usize {
        return switch (any) {
            // With `inline else` the function is explicitly generated
            // as the desired switch and the compiler can check that
            // every possible case is handled.
            inline else => |slice| slice.len,
        };
    }
    
    test "inline else" {
        var any = AnySlice{ .c = "hello" };
        try expect(withSwitch(any) == 5);
    }
    
  • [ ] New packed struct syntax.

    Commit: ziglang/zig@6249a24e8 (stage2: integer-backed packed structs) Date: 2022-02-23

    pub const AbsolutePointerModeAttributes = packed struct(u32) {
        supports_alt_active: bool,
        supports_pressure_as_z: bool,
        _pad: u30 = 0,
    };
    

perillo avatar Dec 26 '22 10:12 perillo

@Vexu, the grammar still fails with some source file in the zig project.

Using the script https://gist.github.com/perillo/a1a731d234b425597a70412a7343b772, these are the errors:

compiler source

./parse.sh ../../zig/src/**/*.zig:

parsing 128 files...
FAIL: ../../zig/src/arch/wasm/CodeGen.zig: 1
<stdin>:661: syntax error
/// 

FAIL: ../../zig/src/link/SpirV.zig: 1
<stdin>:19: syntax error
// 

std source

./parse.sh ../../zig/lib/std/**/*.zig:

parsing 479 files...
FAIL: ../../zig/lib/std/coff.zig: 1
<stdin>:356: syntax error
    /// 

FAIL: ../../zig/lib/std/comptime_string_map.zig: 1
<stdin>:99: syntax error
    const KV = struct { [

FAIL: ../../zig/lib/std/fmt/parse_float/convert_hex.zig: 1
<stdin>:3: syntax error

FAIL: ../../zig/lib/std/http/method.zig: 1
<stdin>:4: syntax error
// 

FAIL: ../../zig/lib/std/meta.zig: 1
<stdin>:118: syntax error
            const EnumKV = struct { [

In the previous list I forgot to add the new tuple type declaration.

  • [ ] New tuple type declaration Commit: ziglang/zig@30eb2a175
    const KV = struct { []const u8, isize };
    

Thanks.

perillo avatar Dec 26 '22 17:12 perillo

Done in 3aefd846ffb1dae997c6a715f81b4003a9072364

Vexu avatar Dec 27 '22 11:12 Vexu

@Vexu, found more errors when checking the zig test suite:

Compiler tests

./check_parser.sh ../../zig/test/**/*.zig:

running 1456 tests...
FAIL: ../../zig/test/behavior/decltest.zig: zig: 0, grammar: 1
<stdin>:5: syntax error
test t
FAIL: ../../zig/test/behavior/tuple_declarations.zig: zig: 0, grammar: 1
<stdin>:13: syntax error
        const T = struct { comptime u32 a
FAIL: ../../zig/test/cases/compile_errors/Issue_6823_dont_allow_._to_be_followed_by_.zig: zig: 1, grammar: 0
FAIL: ../../zig/test/cases/compile_errors/alignment_of_enum_field_specified.zig: zig: 0, grammar: 1
<stdin>:3: syntax error
    b a
FAIL: ../../zig/test/cases/compile_errors/attempted_double_ampersand.zig: zig: 1, grammar: 0
FAIL: ../../zig/test/cases/compile_errors/chained_comparison_operators.zig: zig: 1, grammar: 0
FAIL: ../../zig/test/cases/compile_errors/invalid_empty_unicode_escape.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    const a = '\u{}
FAIL: ../../zig/test/cases/compile_errors/invalid_exponent_in_float_literal-1.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: f128 = 0x1.0p1ab
FAIL: ../../zig/test/cases/compile_errors/invalid_exponent_in_float_literal-2.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: f128 = 0x1.0p50F
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-10.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: f128 = 1.0__
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-11.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: f128 = 1.0e-1__
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-12.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: f128 = 1_x
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-13.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: f128 = 0x_
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-14.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: f128 = 0x0.0_p
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-2.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: f128 = 1_.
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-3.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: f128 = 0.0_;
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-4.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: f128 = 1.0e_
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-5.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: f128 = 1.0e+_
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-6.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: f128 = 1.0e-_
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-7.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: f128 = 1.0e-1_;
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-9.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: f128 = 1__
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_int_literal-1.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: u128 = 10_;
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_int_literal-2.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: u128 = 0b0010_;
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_int_literal-3.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: u128 = 0o0010_;
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_int_literal-4.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: u128 = 0x0010_;
FAIL: ../../zig/test/cases/compile_errors/missing_digit_after_base.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    const x = @as(usize, -0x)
FAIL: ../../zig/test/cases/compile_errors/reject_extern_function_definitions_with_body.zig: zig: 1, grammar: 0
FAIL: ../../zig/test/cases/double_ampersand.0.zig: zig: 1, grammar: 0

Most of the errors are invalid literals; however an interesting case is:

pub fn the_add_function(a: u32, b: u32) u32 {
    return a + b;
}

test the_add_function {
    if (the_add_function(1, 2) != 3) unreachable;
}

decltest was added in ziglang/zig@3bbe6a28e.

Thanks

perillo avatar Dec 27 '22 13:12 perillo

Done in 11ed032665fca9c78144c11993cfe229d21a5bc0, I'm not sure whether this grammar should allow valid numbers or not so I didn't touch them.

Vexu avatar Dec 27 '22 14:12 Vexu

In addition to number literals, the zig parser seems to be able to do additional analysis compared to grammar.y, in order to handle ambiguous cases.

./check_parser.sh ../../zig/test/**/*.zig | grep "zig: 1":

FAIL: ../../zig/test/cases/compile_errors/Issue_6823_dont_allow_._to_be_followed_by_.zig: zig: 1, grammar: 0
    var sequence = "repeat".*** 10;

FAIL: ../../zig/test/cases/compile_errors/attempted_double_ampersand.zig: zig: 1, grammar: 0
    if (a && b) {
        return 1234;
    }

FAIL: ../../zig/test/cases/compile_errors/chained_comparison_operators.zig: zig: 1, grammar: 0
    return 1 < value < 1000

FAIL: ../../zig/test/cases/compile_errors/reject_extern_function_definitions_with_body.zig: zig: 1, grammar: 0
    extern "c" fn definitelyNotInLibC(a: i32, b: i32) i32 {
        return a + b;
    }

FAIL: ../../zig/test/cases/double_ampersand.0.zig: zig: 1, grammar: 0
    pub const a = if (true && false) 1 else 2;

perillo avatar Dec 27 '22 15:12 perillo

Those shouldn't be in the grammar IMO.

Vexu avatar Dec 27 '22 15:12 Vexu

Do you think they should be documented in the Zig Language Reference?

perillo avatar Dec 27 '22 15:12 perillo

I don't think it's strictly necessary but if you want to mention it somewhere then go ahead.

Vexu avatar Dec 27 '22 18:12 Vexu

@Vexu, in order to check the PEG parser I wrote a tool (in Go, since it was more convenient) that downloads up to 1000 repositories using the GitHub API, and check each .zig file with the check_parser.sh script.

For now I found these issues:

  • [ ] A Windows style line ending is not supported

    const std = @import("std")^M
    

    https://github.com/ziglibs/zig-lv2/blob/master/examples/fifths/fifths.zig

    check https://github.com/ziglibs/zig-lv2/blob/main/examples/fifths/fifths.zig : error: exit status 1
    running 1 tests...
    FAIL: /data/cache/zig-package-index/346899503/examples/fifths/fifths.zig: zig: 0, grammar: 1
    <stdin>:1: syntax error
    const std = @import("std");^M
                               ^
    
  • [ ] Official grammar does not support the UTF-8 BOM

    https://github.com/MasterQ32/zero-graphics/blob/master/src/gl_es_2v0.zig

    check https://github.com/MasterQ32/zero-graphics/blob/master/src/gl_es_2v0.zig : error: exit status 1
    running 1 tests...
    FAIL: /data/cache/zig-package-index/368337113/src/gl_es_2v0.zig: zig: 0, grammar: 1
    <stdin>:1: syntax error
    <feff>//
    ^
    

    NOTE: <feff> is Vim bomb.

  • [ ] Alignment using tabs is not supported

    test {
        var window = try capy.Window.init();
        try window.set(capy.Column(.{}, .{
            capy.Label(.{ .text = "Balls with attraction and friction" }),
            capy.Label(.{ })
                    .bind("text", totalEnergyFormat),
                    capy.Align(.{}, &canvas),
        }));
    }
    

    There are two tabs before capy.Align(.{}, &canvas),

    https://github.com/capy-ui/capy/blob/master/examples/balls.zig

    check https://github.com/capy-ui/capy/blob/master/examples/balls.zig : error: exit status 1
    running 1 tests...
    FAIL: /data/cache/zig-package-index/351056766/examples/balls.zig: zig: 0, grammar: 1
    <stdin>:58: syntax error
    		capy.Align(.{}, &canvas),
    ^
    
  • [ ] PEG parser crash

    https://github.com/marlersoft/zigwin32/blob/master/win32/everything.zig

    check marlersoft/zigwin32:everything.zig: error: exit status 1
    running 1 tests...
    FAIL: /data/cache/zig-package-index/333504729/win32/everything.zig: zig: 0, grammar: 139
    

    The culprit is probably the source file size: 20.6 MB

perillo avatar Jan 09 '23 15:01 perillo

Found other issues:

  • [ ] Not sure where is the problem

    pub const DmaTuple = struct { DmaController(0), DmaController(1), DmaController(2), DmaController(3) };
    

    https://github.com/paoda/zba/blob/main/src/core/bus/dma.zig

    check https://github.com/paoda/zba/blob/main/src/core/bus/dma.zig : error: exit status 1
    running 1 tests...
    FAIL: /data/cache/zig-package-index/464707106/src/core/bus/dma.zig: zig: 0, grammar: 1
    <stdin>:8: syntax error
    pub const DmaTuple = struct { DmaController(0), DmaController(1), DmaController(2), DmaController(3) };
                                               ^
    
  • [ ] Official grammar does not report an error for "binary operator xxx has whitespace on one side, but not the other"

    const a = 1 +2;
    

    https://github.com/wapc/wapc-guest-zig/blob/master/wapc.zig

    check https://github.com/wapc/wapc-guest-zig/blob/master/wapc.zig : error: exit status 1
    running 1 tests...
    FAIL: /data/cache/zig-package-index/207880856/wapc.zig: zig: 1, grammar: 0
    

    Not sure if this is responsibility of the PEG grammar.

Here is the full report: https://gist.github.com/perillo/c33e79e6d95efabaf302ef3c5d5907ad

perillo avatar Jan 09 '23 17:01 perillo

Thank you for taking care of the grammar. A few observations:

  • I see that ContainerDeclarations is declared with right-hand recursion and an additional empty rule. Would it be more readable to refactor the rule and replace it with *, like this?
diff --git a/grammar/grammar.y b/grammar/grammar.y
index ec08094..c5c47be 100644
--- a/grammar/grammar.y
+++ b/grammar/grammar.y
@@ -1,13 +1,12 @@
 Root <- skip container_doc_comment? ContainerMembers eof
 
 # *** Top level ***
-ContainerMembers <- ContainerDeclarations (ContainerField COMMA)* (ContainerField / ContainerDeclarations)
+ContainerMembers <- ContainerDeclaration* (ContainerField COMMA)* (ContainerField / ContainerDeclaration*)
 
-ContainerDeclarations
-    <- TestDecl ContainerDeclarations
-     / ComptimeDecl ContainerDeclarations
-     / doc_comment? KEYWORD_pub? Decl ContainerDeclarations
-     /
+ContainerDeclaration
+    <- TestDecl
+     / ComptimeDecl
+     / doc_comment? KEYWORD_pub? Decl
  • ascii_char_not_nl_slash_squote rule specifies a weird range (see multiple dashes) [\000-\011\013-\046-\050-\133\135-\177] - could it be clarified?

anatol avatar Jan 23 '23 17:01 anatol

I see that ContainerDeclarations is declared with right-hand recursion and an additional empty rule. Would it be more readable to refactor the rule and replace it with *

Seconded. Also IMHO container_doc_comment? should be part of ContainerMembers so the container declaration expression needs not repeat it:

ContainerDeclAuto <- ContainerDeclType LBRACE container_doc_comment? ContainerMembers RBRACE

ascii_char_not_nl_slash_squote rule specifies a weird range (see multiple dashes) [\000-\011\013-\046-\050-\133\135-\177] - could it be clarified?

It skips \012 (newline), \047 (single quote) and \134 (backslash). Perhaps the name should be clarified that it's backslash instead of slash and sort them in order? BTW the dash in the middle was a typo and fixed in GH-52.

McSinyx avatar Apr 03 '24 09:04 McSinyx