zig Proposal: allow inferred-size array in type annotations

I originally wrote up this proposal in a comment on #5038. I've remained a fan of it since then, so thought it was worth promoting to a proper proposal.

The biggest blocker to eliminating T{ ... } syntax from the language (which is a broadly discouraged syntax form) is the existence of inferred-size array literals. Today, the syntax [_]T{ ... } defines an array whose length is inferred from the number of elements provided. There is no way to achieve the same thing with type annotations, since [_]T is not an actual type (arrays in Zig always have a fixed length encoded in the type).

I propose that we permit the syntax [_]T in a type annotation on const and var (both local and container scope). It could optionally also be allowed as the operand to @as. Like the [_]T{ ... } syntax, this results in a normal-fixed size array, and is a specific syntax form: for instance, const x: ([_]T) = ... is disallowed, just as ([_]T){ ... } is disallowed today. When this "type" is used, the expression with this type is given a new kind of result location. All peers of this expression must be array initialization literals with lengths matching the expected length.

I think the best way to get an idea of how this would work is to look at the implementation. The new form of result location would be implemented in AstGen like this:

/// This expression is the initialization expression of a var decl whose type is an inferred-length array.
/// Every result sub-expression must use array initialization syntax. The array's length should be written
/// to `chosen_len` so the caller can retroactively set the array length.
inferred_len_array_ptr: struct {
    /// The array pointer to store results into.
    ptr: PtrResultLoc,
    /// This is initially `null`, and is set when an expression consumes this result location.
    /// If an expression has a length which does not match the currently-set one, it can use `src_node` to emit an error.
    chosen_len: *?struct {
        len: u32,
        src_node: Ast.Node.Index,
    },
},

When we encounter the first peer array initialization, its length is written to chosen_len. Later peers will check that their length matches the other length, and emit an error if not. The var/const decl will create a stack allocation whose length in the ZIR is retroactively rewritten to match the length of the initialization expressions.

This kind of result location will immediately trigger an error when encountered for any expression other than array initializers, such as struct inits and any syntax form which calls AstGen.rvalue.

Here is what the proposal looks like in practice:

// these are all valid
const x: [_]u8 = .{ 1, 2, 3 };
const y: [_]u8 = if (condition) .{ 1, 2 } else switch (x) {
    .foo => .{ 3, 4 },
    .bar => .{ 5, 6 },
    else => .{ 7, 8 },
};
const z: [_][]const u8 = blk: {
    if (foo) break :blk .{ "hello", "world" };
    break :blk .{ "foo", "bar" };
};

// this is invalid
// error: array length cannot be determined
// note: result must be array initialization expression
const a: [_]u8 = @as([3]u8, .{ 1, 2, 3 });
const b: [_]i16 = blk: {
    const result: [2]i16 = .{ 1, 2 };
    break :blk result;
};
const c: [_]u8 = if (cond) .{ 1, 2 } else something_else;

// this is also invalid
// error: array length '3' does not match array length '2'
// note: array with length '2' here
// note: inferred-length array must have a fixed length
const d: [_]u8 = if (cond) .{ 1, 2 } else .{ 3, 4, 5 };

Implementing this proposal would solve the primary blocker for accepting #5038. I personally believe that proposal to be the right direction for the language, but even if it is not accepted, I feel that this proposal is beneficial, because it brings the language further in line with our preference for direct type annotations over explictly-typed expressions.

Apr 17 '24 12:04 mlugg

Related: https://github.com/ziglang/zig/issues/9056 proposed a similar idea of a pseudo-/placeholder type [_]T for (comptime) function arguments, mainly for improving the ergonomics around comptime function calls (which under status-quo memoize []const T by pointer and length instead of by element values).

The ideas seem compatible to me, but maybe their discussions should be kept separate (not sure).

This kind of result location will immediately trigger an error when encountered for any expression other than array initializers [...] .

That seems like an unfortunate limitation - it means users have to spell out the size again if they move one branch into its own function. In my head the language would be more regular if array values, from whatever source, were allowed. I assume the main motivation behind this restriction is implementation simplicity?

It could optionally also be allowed as the operand to @as.

Again I would be in favor of this for (my perception of) regularity/consistency. Not sure if my code would be affected, but I imagine it would be annoying having to switch from status-quo f([_]T{...}); to const temp: [_]T = ...; f(temp); due to a (seemingly arbitrary) limitation in language expressiveness.

Apr 17 '24 13:04 rohlem

The ideas seem compatible to me, but maybe their discussions should be kept separate (not sure).

Yes, I think these proposals deserve to be considered independently of one another.

That seems like an unfortunate limitation.

I don't think this is the case. If you're moving one branch into a function for some reason, just move the type annotation too:

const foo = if (xyz) computeFoo() else foo: {
    const foo: [_]u32 = ...;
    break :foo foo;
};

Seems a little odd, sure, but I would raise the point that this seems like a pretty rare thing to need anyway: it's okay for it to be a little non-trivial to write.

In my head the language would be more regular if array values, from whatever source, were allowed. I assume the main motivation behind this restriction is implementation simplicity?

I think it is more conceptually straightforward too to be honest -- but perhaps that's just me being used to status quo and trying to treat this as a "translation" of that concept. The proposal could be amended to support other rvalues here, that wouldn't be too difficult AFAICT -- as long as there was at least one array initialization so that the length was trivially known. Removing that rule would make this far more complex. Thus, I believe the proposal as stated is more reasonable.

Again I would be if favor of this for (my perception of) regularity/consistency.

Personally, I lean a little towards not allowing it -- it feels dirtier to me to special-case a builtin parameter as opposed to a pure syntax construct -- but I don't really have a strong opinion. To be clear, if accepted right now the proposal would not apply this to @as, but it could be amended to if that is the consensus.

Not sure if my code would be affected, but I imagine it would be annoying.

Why could you not take a slice here? I am aware it impacts memoization, but as of the recent type equivalence change, memoization should not have any impact on language semantics.

Apr 17 '24 14:04 mlugg

Implementing this proposal would solve the primary blocker for accepting https://github.com/ziglang/zig/issues/5038. I personally believe that proposal to be the right direction for the language, but even if it is not accepted, I feel that this proposal is beneficial, because it brings the language further in line with our preference for direct type annotations over explictly-typed expressions.

I'm not sure this would be beneficial without #5038. Without #5038 this is a different way of doing the same thing. It's also a form of special syntax for type annotations which seems unprecedented.

The biggest blocker to eliminating T{ ... } syntax from the language (which is a broadly discouraged syntax form)

Why is it discouraged?

Apr 17 '24 16:04 ethernetsellout

What about blockers like

    const a: [*:null]const ?[*:0]const u8 = &.{ "a", "b", "c" }; // doesn't compile

Are they related/could be covered by this proposal somehow? Or would they need the use of @as and/or auxiliary named constants from then on?

Apr 18 '24 08:04 ni-vzavalishin

@ethernetsellout

Without #5038 this is a different way of doing the same thing.

Equally, const x = T{ ... } is a different way of doing const x: T = .{ ... }. But the second is a preferred syntax (more on this later), so any potential uses for the former don't mean the latter shouldn't exist.

It's also a form of special syntax for type annotations which seems unprecedented.

Sure, but I don't see why it's any worse than the existing special case for [_]T{ ... } syntax. There's a reason I explicitly drew that parallel in the proposal!

Why is it discouraged?

It allows for RLS, is easier for tooling to interpret, and more consistent for humans to read.

@ni-vzavalishin

What about blockers like [...]

This code snippet is completely unrelated to this proposal.

Apr 18 '24 09:04 mlugg

I propose that we permit the syntax [_]T in a type annotation on const and var (both local and container scope). It could optionally also be allowed as the operand to @as.

What about function return types? I would love to get rid of this: https://codeberg.org/kiesel-js/kiesel/src/commit/48e2ebc8f8a5f618d2e89960be9ce7e245941755/src/builtins/global.zig#L32-L36

Apr 18 '24 12:04 linusg

Sure, but I don't see why it's any worse than the existing special case for [_]T{ ... } syntax.

The existing special case applies only in a single, easily understandable situation. Moving the [_] into the type signature gives it potentially a lot more flexibility:

cost x = @as([_]u8, .{ 1, 2, 3 });

cost y = .{ 1, 2, 3 };
cost z = @as([_]u8, y);

const T = struct {
    v: [_]u8 = .{ 1, 2, 3 }
};

fn foo() [_]u8 {
    return .{ 1, 2, 3};
}

fn bar(comptime x: [_]u8) void {
    ...
}
// bar(.{ 1 2 3 });

All of these could work in principle... or they could be disallowed, but then you have to explain why. No such questions can arise with the literal notation.

Apr 19 '24 05:04 ghost

The following example from the previous post looks rather far fetched IMHO

const T = struct {
    v: [_]u8 = .{ 1, 2, 3 }
};

but if it's accepted I'd argue that Zig also should allow further generalization of the same, where not only the array dimension can be inferred from the default initializer, but also the entire type. Not sure what the syntax would be, probably the following is not the best idea:

const T = struct {
    v = T1.init(0)
};

Apr 19 '24 10:04 ni-vzavalishin

Here is what the proposal looks like in practice:

Would be good to add some examples where the array in question is an unnamed temporary, rather an initializer for a bindings. One of the more frequent usages for [_] syntax for me is for ad-hoc for loops a la:

pub fn main() void {
    for ([_][]const u8{ "hello", "world" }) |s| {
        std.debug.print("s = {s}\n", .{s});
    }
}

how that would look under this proposal? It seems that I'd have to add a named temporary here, no?

Apr 19 '24 12:04 matklad

@linusg

The example you give wouldn't work with this proposal even if it were permitted for return types. This isn't proposing anything that relies on semantic analysis: to infer the array length of an expression, it must be a trivial array initialization expression. The use of ++ means the array length is only known to semantic analysis, so this proposal isn't applicable.

@zzyxyzz

Why would you assume that permitting [_]T as the type of a const/var would allow it in those other scenarios? Today, in T{ ... } syntax, T is a normal type expression, except the form [_]U is also permitted. I am proposing that in const/var definition syntax, T is a normal type expression, except the form [_]U is also permitted. I do not see why the former rule is easier to understand than the latter. For parameters, I could at least see that the syntax form is similar (in that the type comes after a :), but for @as and function return types, I don't understand your point whatsoever.

EDIT: it was also just pointed out to me that function parameters are already a special case thanks to the existence of anytype. The fact that nobody finds that particularly confusing seems to put to rest the idea that there might be a confusing ambiguity here.

@ni-vzavalishin

I get that it's an extension of what itself is an extension of my proposal, but what you propose is a completely unrelated issue. Feel free to open your own proposal, but I'm almost 100% confident it would quickly be rejected: there's no reason to allow omitting struct field types like this.

@matklad

To be clear, this proposal does not necessarily depend on #5038. It is possible that this proposal is accepted and #5038 is rejected, in which case that snippet continues to work. However, this proposal does fit best alongside #5038, so assuming both are accepted, then indeed, that snippet will begin to fail. If you really want to do that, the easiest way will indeed be to introduce a temporary. I don't think this is a big loss.

Now, a sincere question: can you give an example of a few places where you've written this? I can count the number of times I've used that pattern on one hand (it's, erm, two). It's incredibly rare that I need to loop, at runtime, over a comptime-known fixed set of values, which is not already a constant somewhere. (I say "at runtime" because inline for can loop over a tuple, so you can just do inline for (.{ "hello", "world" })!)

Apr 19 '24 13:04 mlugg

https://github.com/tigerbeetle/tigerbeetle/blob/bd1280c90d0072cc204be364135edc8d54e8e129/build.zig#L940
https://github.com/tigerbeetle/tigerbeetle/blob/bd1280c90d0072cc204be364135edc8d54e8e129/src/ewah.zig#L309
https://github.com/tigerbeetle/tigerbeetle/blob/bd1280c90d0072cc204be364135edc8d54e8e129/src/state_machine.zig#L3588
https://github.com/tigerbeetle/tigerbeetle/blob/bd1280c90d0072cc204be364135edc8d54e8e129/src/stdx.zig#L691-L696
https://github.com/tigerbeetle/tigerbeetle/blob/bd1280c90d0072cc204be364135edc8d54e8e129/src/clients/node/ci.zig#L31
https://github.com/tigerbeetle/tigerbeetle/blob/bd1280c90d0072cc204be364135edc8d54e8e129/src/vsr/replica_test.zig#L211-L215

(I say "at runtime" because inline for can loop over a tuple, so you can just do inline for (.{ "hello", "world" })!

Yeah, inline for sometimes can help here, but at the cost of code bloat. My ideal solution would be to make

for(.{a, b, c}) |_| {}

just DWIM. Which perhaps we can do as a part of the current proposal? for(.{ can be treated as implicit [_] perhaps?

Apr 19 '24 13:04 matklad

I think that could be pretty reasonable, but it's a separate proposal IMO. That adds a (tiny) bit more complexity to RLS since you don't have the type being iterated over as a result type, so it's another case to consider.

Here's what I think of each of the uses you link:

This is an irritating flaw in the build system; you should be able to use the same root module for the shared and static lib rather than creating it twice. I've been meaning to make a PR to sort this out.
I somewhat share Andrew's opinion that this is an antipattern: instead, extract the testing logic into a function, and call it n times. That way, you don't hinder the usefulness of stack traces for no reason.
Maybe same as 2, but also, inline for so N/A.
Same as 2
inline for so N/A.
I don't understand what this is doing well enough to know how I'd write it :laughing:

Apr 19 '24 13:04 mlugg

there's no reason to allow omitting struct field types like this.

Having to write

const T = struct {
  field: FieldType = FieldType.init(arguments),
  ....

sometimes gets annoying, reaching its peak when FieldType is a generic, so here is your reason. Allowing some automatic inference in struct field types like proposed earlier IMHO opens a door for ideas like that. However, unless the latter is accepted, I also rather feel that it would be immediately rejected.

Apr 19 '24 14:04 ni-vzavalishin

I think #9938 is what you really want. You're targeting the wrong bit of syntax: Zig is generally migrating towards having more type annotations, not less.

Apr 19 '24 14:04 mlugg

@mlugg

Why would you assume that permitting [_]T as the type of a const/var would allow it in those other scenarios?

I guess I just do, hard to say why exactly :)

Jokes aside, my point was that your proposal is not really a trivial rewrite anymore (unlike method calls or [_]T{} literals). Especially if extended to @as expressions, which would be the default way of Typing a literal outside of a variable initialization if #5038 is accepted, it would feel even less like "one specific syntactic exception" and more like a lazily resolved type constraint. In the latter case I feel it's reasonable to ask why the rest of the examples wouldn't work, since they clearly could, using the same resolution mechanism.

Overall, I see potential for complication in this proposal that is not commensurate with the benefit, which purely stylistic when considered apart from #5038, AFAICT. It's how I feel about #5038 in general -- it may be slightly nicer in the 90% case, but it creates enough complexities and inconvenience in the remaining 10 to be no longer worthwhile.

My 2c.

Apr 19 '24 14:04 ghost

zig zig copied to clipboard

Proposal: allow inferred-size array in type annotations

zig
zig copied to clipboard