zig icon indicating copy to clipboard operation
zig copied to clipboard

Deficiency: uninstantiable `union` fields still require a tag value (proposal: exclude them)

Open rohlem opened this issue 1 year ago • 0 comments

Tagged union-s are allowed to have states/fields of uninstantiable (noreturn-like) types - see https://github.com/ziglang/zig/issues/3257, https://github.com/ziglang/zig/issues/15909, and other issues for explanation.

However, in status-quo it is required that the union states/fields match the tag enum's states/fields exactly, including states/fields of uninstantiable types. This introduces an unnecessary inefficiency to tagged union-s that can only be worked around by manual reification via @Type, which worsens ergonomics, making code both harder to read and write.

Some example code to reference:

/// returned union has 0-3 instantiable states/fields
pub fn U(comptime has_a: bool, comptime has_b: bool) type {
    return union(enum) {
        const FieldA = if (has_a) u8 else noreturn;
        const FieldB = if (has_b) u8 else noreturn;

        a: FieldA,
        b1: FieldB,
        b2: FieldB,

        /// `switch` allows specifying all states/fields,
        /// which improves ergonomics in generic code.
        fn assertNotB2(x: @This()) void {
            switch (x) {
                .a, .b1 => {},
                .b2 => unreachable,
            }
        }
    };
}

comptime {
    const UA = U(true, false);
    const UB = U(false, true);
    const UAB = U(true, true);
    const TagA = @typeInfo(UA).Union.tag_type.?;
    const TagB = @typeInfo(UB).Union.tag_type.?;
    const TagAB = @typeInfo(UAB).Union.tag_type.?;

    const assert = @import("std").debug.assert;
    assert(@bitSizeOf(TagA) == 2); //1 state should require 0 bits
    assert(@bitSizeOf(TagB) == 2); //2 states should only require 1 bit
    // Note: the compiler-generated Tag enum already isn't shared in status-quo
    assert(TagA != TagB);
    assert(TagA != TagAB);
    //for completeness, this is also disallowed in status-quo:
    const E = enum { a };
    const T = union(E) {
        a: void,
        b: noreturn, //error: no field named 'b' in enum 'main.comptime_0__enum_365'
    };
    _ = T{ .a = {} };
}

The main issue with status-quo is that the tag type is forced to grow to more bits than necessary. There are two main cases to consider:

  • For user-provided union(T) tag types, I propose it should simply be allowed to provide a more optimal enum type than in status-quo, which is not required (but can still be allowed) to reserve states/values for uninstantiable union states/fields. This boils down to selectively loosening the current state/field equivalence check between the union and the tag type.

  • For compiler-provided union(enum) tag types, it might make sense after #18816 / #19190 to expect the tag type to be deduplicated for types created from the same AST node. However, this currently isn't the case, and I personally don't see the value in doing this. If that particular behavior is desired, an explicit enum type can be created and used instead. Therefore, I propose that the compiler-provided tag type also shouldn't include states/fields for union states/fields with uninstantiable types.


Technically optional: Salvaging (exhaustive) switch

The one additional demand I want to pose here is that the ergonomics of switch should not degrade due to this optimization. I find it highly valuable to be able to write a single switch, include all fields, and re-use that code regardless of which fields are instantiable and which aren't. I believe that today this only works because the tag enum contains all of these fields, which is used as result type of the enum literals in the switch prongs.

In order to not degrade this use case, tagged union types basically need a list of all uninstantiable field names, and those particular names have to be whitelisted to appear in switch prongs. (Further allowing them in ==/!= comparisons, etc., would also be nice though .)

The cleanest implementation I can think of for this would be to include a second full_tag_type in builtin.Type.Union. This full tag type were to be used as "first result location" for type checking, while the actual field enum type is used afterwards - at this point uninstantiable field names are dropped due to being unreachable.

I realize this last part is a semi-big language feature to propose, but I really think the ergonomic boon would warrant it. That said, there'd be ways for me to work around it in userland, so it's not as critical of a requirement as the first half (which would require reifying all applicable tagged union types with @Type).

rohlem avatar May 04 '24 13:05 rohlem