zig icon indicating copy to clipboard operation
zig copied to clipboard

zig fmt: Removal of blank line after doc comment not indempotent

Open linusg opened this issue 2 years ago • 4 comments
trafficstars

Zig Version

0.11.0-dev.3031+f40539e5d

Steps to Reproduce and Observed Behavior

While removing a blank line after a doc comment zig fmt introduces a newline above said comment, which then is shrunken down to one when formatting again:

$ cat test.zig 
fn foo() void {}

/// bar

fn bar() void {}
$ zig fmt test.zig 
test.zig
$ cat test.zig
fn foo() void {}


/// bar
fn bar() void {}
$ zig fmt test.zig
test.zig
$ cat test.zig
fn foo() void {}

/// bar
fn bar() void {}
$ zig fmt test.zig
$ 

Expected Behavior

$ cat test.zig 
fn foo() void {}

/// bar

fn bar() void {}
$ zig fmt test.zig 
test.zig
$ cat test.zig
fn foo() void {}

/// bar
fn bar() void {}
$ zig fmt test.zig
$ 

linusg avatar May 10 '23 18:05 linusg

I'm working on this now

ghost avatar May 11 '23 18:05 ghost

I believe I've found the source of the problem in lib/std/zig/render.zig:

/// Render all members in the given slice, keeping empty lines where appropriate
fn renderMembers(gpa: Allocator, ais: *Ais, tree: Ast, members: []const Ast.Node.Index) Error!void {
    if (members.len == 0) return;
    const container: Container = for (members) |member| {
        if (tree.fullContainerField(member)) |field| if (!field.ast.tuple_like) break .other;
    } else .tuple;
    try renderMember(gpa, ais, tree, container, members[0], .newline);
    for (members[1..]) |member| {
        try renderExtraNewline(ais, tree, member);
        try renderMember(gpa, ais, tree, container, member, .newline);
    }
}

This loop puts a newline between each member in the file if there was one in the source. But later (called from inside renderMember)...

/// end_token is the token one past the last doc comment token. This function
/// searches backwards from there.
fn renderDocComments(ais: *Ais, tree: Ast, end_token: Ast.TokenIndex) Error!void {
    // Search backwards for the first doc comment.
    const token_tags = tree.tokens.items(.tag);
    if (end_token == 0) return;
    var tok = end_token - 1;
    while (token_tags[tok] == .doc_comment) {
        if (tok == 0) break;
        tok -= 1;
    } else {
        tok += 1;
    }
    const first_tok = tok;
    if (first_tok == end_token) return;

    if (first_tok != 0) {
        const prev_token_tag = token_tags[first_tok - 1];

        // Prevent accidental use of `renderDocComments` for a function argument doc comment
        assert(prev_token_tag != .l_paren);

        if (prev_token_tag != .l_brace) {
            try renderExtraNewlineToken(ais, tree, first_tok);
        }
    }

    while (token_tags[tok] == .doc_comment) : (tok += 1) {
        try renderToken(ais, tree, tok, .newline);
    }
}

The call to renderExtraNewlineToken re-emits the same newline.

ghost avatar May 11 '23 22:05 ghost

It's a pain to fix because the AST renderer is kludgily designed. A lot of its logic is based on referencing the original raw source instead of the AST, which causes this duplicative behavior. I was able to suppress the symptom by using a global variable to represent whether a newline had already been omitted, but that's obviously not an acceptable solution. I think the right solution is for the newlines to be part of the AST, but I don't want to make a design change without approval from zig project leaders.

ghost avatar May 14 '23 21:05 ghost

using a global variable to represent whether a newline had already been omitted, but that's obviously not an acceptable solution

One hacky alternative I have seen in real life code bases is to store the state along the formatting functions, but this would require to make emitting output conditional or to provide another throw-away buffers to "recurse within recursion up to a certain limit".

I think the right solution is for the newlines to be part of the AST, but I don't want to make a design change without approval from zig project leaders.

Unfortunately, 1. this would increase memory consumption for the more relevant execution path of compilation instead of formatting and 2. formatting and alike are recursive, which makes peeking and look-behind painful.

I think thats the right call though until the language is finished and the parser can be made iterative to handle all these formatting things efficiently (formatting function arguments of trailing , should be then most inefficient one).

matu3ba avatar Jun 11 '23 19:06 matu3ba