zig icon indicating copy to clipboard operation
zig copied to clipboard

allow ranges when switching on enums

Open andrewrk opened this issue 2 years ago • 12 comments

Use case:

pub const Index = enum(u32) {
    pub const first_type: Index = .u1_type;
    u1_type,
    u8_type,
    i8_type,
    u16_type,
    i16_type,
    u29_type,
    u32_type,
    i32_type,
    u64_type,
    i64_type,
    u80_type,
    u128_type,
    i128_type,
    usize_type,
    isize_type,
    c_char_type,
    c_short_type,
    c_ushort_type,
    c_int_type,
    c_uint_type,
    c_long_type,
    c_ulong_type,
    c_longlong_type,
    c_ulonglong_type,
    c_longdouble_type,
    f16_type,
    f32_type,
    f64_type,
    f80_type,
    f128_type,
    anyopaque_type,
    bool_type,
    void_type,
    type_type,
    anyerror_type,
    comptime_int_type,
    comptime_float_type,
    noreturn_type,
    anyframe_type,
    null_type,
    undefined_type,
    enum_literal_type,
    atomic_order_type,
    atomic_rmw_op_type,
    calling_convention_type,
    address_space_type,
    float_mode_type,
    reduce_op_type,
    call_modifier_type,
    prefetch_options_type,
    export_options_type,
    extern_options_type,
    type_info_type,
    manyptr_u8_type,
    manyptr_const_u8_type,
    single_const_pointer_to_comptime_int_type,
    const_slice_u8_type,
    anyerror_void_error_union_type,
    generic_poison_type,
    empty_struct_type,
    pub const last_type: Index = .empty_struct_type;

    pub const first_value: Index = .undef;
    /// `undefined` (untyped)
    undef,
    /// `0` (comptime_int)
    zero,
    /// `0` (usize)
    zero_usize,
    /// `1` (comptime_int)
    one,
    /// `1` (usize)
    one_usize,
    /// `std.builtin.CallingConvention.C`
    calling_convention_c,
    /// `std.builtin.CallingConvention.Inline`
    calling_convention_inline,
    /// `{}`
    void_value,
    /// `unreachable` (noreturn type)
    unreachable_value,
    /// `null` (untyped)
    null_value,
    /// `true`
    bool_true,
    /// `false`
    bool_false,
    /// `.{}` (untyped)
    empty_struct,
    pub const last_value: Index = .empty_struct;

    /// Used for generic parameters where the type and value
    /// is not known until generic function instantiation.
    generic_poison,

    none = std.math.maxInt(u32),

    _,
    pub fn isNoReturn(ty: Type) bool {
        switch (ty.ip_index) {
            InternPool.Index.first_type...@intToEnum(InternPool.Index, @enumToInt(InternPool.Index.noreturn_type) - 1) => return false,

            .noreturn_type => return true,

            @intToEnum(InternPool.Index, @enumToInt(InternPool.Index.noreturn_type) + 1)...InternPool.Index.last_type => return false,

            InternPool.Index.first_value...InternPool.Index.last_value => unreachable,
            .generic_poison => unreachable,

            // TODO add empty error sets here
            // TODO add enums with no fields here
            _ => return false,

            .none => switch (ty.tag()) {
                .noreturn => return true,
                .error_set => {
                    const err_set_obj = ty.castTag(.error_set).?.data;
                    const names = err_set_obj.names.keys();
                    return names.len == 0;
                },
                .error_set_merged => {
                    const name_map = ty.castTag(.error_set_merged).?.data;
                    const names = name_map.keys();
                    return names.len == 0;
                },
                else => return false,
            },
        }
    }

Currently this gives an error for having declarations in the enum. If those are worked around, then it gives this error:

/home/andy/Downloads/zig/src/type.zig:2847:19: error: ranges not allowed when switching on type 'InternPool.Index'
        switch (ty.ip_index) {
                ~~^~~~~~~~~
/home/andy/Downloads/zig/src/type.zig:2848:40: note: range here
            InternPool.Index.first_type...@intToEnum(InternPool.Index, @enumToInt(InternPool.Index.noreturn_type) - 1) => return false,
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I think this should be allowed. It makes sense what should be generated in machine code, and this is more type-safe than the alternative, which does compile:

    pub fn isNoReturn(ty: Type) bool {
        switch (@enumToInt(ty.ip_index)) {
            @enumToInt(InternPool.Index.first_type)...@enumToInt(InternPool.Index.noreturn_type) - 1 => return false,

            @enumToInt(InternPool.Index.noreturn_type) => return true,

            @enumToInt(InternPool.Index.noreturn_type) + 1...@enumToInt(InternPool.Index.last_type) => return false,

            @enumToInt(InternPool.Index.first_value)...@enumToInt(InternPool.Index.last_value) => unreachable,
            @enumToInt(InternPool.Index.generic_poison) => unreachable,

            // TODO add empty error sets here
            // TODO add enums with no fields here
            else => return false,

            @enumToInt(InternPool.Index.none) => switch (ty.tag()) {
                .noreturn => return true,
                .error_set => {
                    const err_set_obj = ty.castTag(.error_set).?.data;
                    const names = err_set_obj.names.keys();
                    return names.len == 0;
                },
                .error_set_merged => {
                    const name_map = ty.castTag(.error_set_merged).?.data;
                    const names = name_map.keys();
                    return names.len == 0;
                },
                else => return false,
            },
        }
    }
}

This has two problems compared to the original example:

  • A wrong type could be used inside the @enumToInt calls
  • It uses else instead of _ so newly added enum tags will not be noticed at this site by the compiler

andrewrk avatar May 02 '23 21:05 andrewrk

I don't like this solution. I believe it is both a bit ugly, and also it solves the problem in a way that is only useful for exactly this use case.

Here is a counter proposal:

Counter Proposal

Consider the following snippet from https://ziglang.org/documentation/master/#switch

test "switch simple" {
    ...
    const b = switch (a) {
        // Switching on arbitrary expressions is allowed as long as the
        // expression is known at compile-time.
        zz => zz,
        blk: {
            const d: u32 = 5;
            const e: u32 = 100;
            break :blk d + e;
        } => 107,
    };
   ...
}

This almost exactly solves the problem. Except for one issue, it only allows you to compile time have singular options.

Proposal

Instead of zz needing to be a comptime T: type = @TypeOf(a) in the above snippet, or a range for integers (special case for integral types), make it so it can ALSO be a comptime known aggregate. i.e you can also switch on a comptime known [N]T, []T, or *[N]T.

Then you can make a helper function like this: This is already valid zig code

pub fn enumFieldRange(comptime E: type, comptime firstVal: E, comptime lastVal: E) []const E {
    const fields = @typeInfo(E).Enum.fields; 

    comptime var start_index: usize = 0;
    comptime var end_index: usize = 0;

    inline for(fields, 0..) |f, i| {
        const val = @intToEnum(E, f.value);
        if (val == firstVal) start_index = i;
        if (val == lastVal) end_index = i;
    }

    comptime var enumArray: [end_index-start_index+1]E = undefined;
    inline for(&enumArray, fields[start_index..end_index+1]) |*e, f| {
        e.* = @intToEnum(E, f.value);
    }

    return &enumArray;
}

And you use it in your switch like this.

switch (...) {
    ...
    enumFieldRange(Index, Index.first_value, Index.last_value) => ...
    ...
}

Or better yet declare the field range as a pub decl inside your enum and give it a long descriptive name.

//inside the enum
pub const Types = enumFieldRange(@This(), .u1_type, .empty_struct_type);
pub const Values = enumFieldRange(@This(), .undef, .empty_struct);

And switch on these instead of having 4 decls.

AssortedFantasy avatar May 03 '23 06:05 AssortedFantasy

I also believe my counter proposal could be useful for other cases. inline else is a bit restrictive although it can be used in a similar way.

Inline else is the deal with it after way of doing it, being allowed to switch on comptime known aggregates is the "deal with before" kinda way.

Also, frankly the fact that comptime can't produce aggregated expressions for switches (like .A,.B,.C =>) just seems like a hole in comptime for no good reason. This above triple can only be expressed as source code.

AssortedFantasy avatar May 03 '23 06:05 AssortedFantasy

yeah being able to switch on an comptime known array I feel is a much more flexible solution, I thought there was a previous proposal for it but I could not find the issue number in my search

nektro avatar May 03 '23 06:05 nektro

  • It uses else instead of _ so newly added enum tags will not be noticed at this site by the compiler

Note that use of the the proposal also weakens the usefulness of exhaustiveness checking - I'm not sure I'd count any switch statement that has a range case (other than integer ranges) as exhaustive. If a tag gets added between the endpoints of a range, the compiler won't flag that you're not handling the new case. It may not be clear at the definition of the enum that the order was important and you might introduce a bug when you expected the compiler to point you to every switch where the new tag needs handling.

dweiller avatar May 03 '23 08:05 dweiller

I don't care which of these proposals is implemented, as long as one is. It produces ugly and less type-safe code to have to convert the enums to int.

As for the complaint by some that enums may not have a proper order so that ranges don't make sense, you could add an (e.g.packed) attribute to an enum definition to say that it is properly ordered and therefore ranges (and ordered comparisons like <=) could be used. Personally, I don't see this as a problem.

dvmason avatar Sep 17 '23 23:09 dvmason

EDIT: Thank you @ianprime0509 for pointing to the proposal I couldn't find and repeated here below: #2473 .

Repeated proposal:

I just thought of a quite similar feature, though I couldn't find the proposal I thought I remembered for it - I think I might have mixed it up with this one, so I'll just post it here: The ability to specify an error set (type value) as a switch prong when switching on an error:

test {
  const E = error{A, B};
  _ = .{E};
  _ = @TypeOf(switch(@as(error{A, B, C}, undefined)) {
    //E => unreachable, //currently not allowed, but could be useful I imagine
    error.A, error.B => unreachable, //status-quo requires explicitly listing all errors
    else => |e| e,
  });
}

EDIT: This would be implementable using a helper function returning a comptime-known [N]anyerror/[]const anyerror value using the feature of the counter-proposal alone, however directly allowing the error set value would be a usability bonus with completely unambiguous meaning IMO. (Note: The else prong of a switch on an error value should keep narrowing the capture's type as it does in status-quo.)

Overlaps between the values of different prongs should continue being a compile error. We could allow overlapping values within the same prong. (In the case of error sets though, code could also just merge them using ||.)

rohlem avatar Feb 21 '24 18:02 rohlem

@rohlem that proposal is #2473.

ianprime0509 avatar Feb 22 '24 01:02 ianprime0509

I like the title, but is there a compelling advantage to having the interspersed const values over just allowing this?

const Beanstalk = enum {
    jack,
    fee,
    fie,
    foe,
    fum,
    beanstalk,
};

fn upTheBeanstalk(up: Beanstalk) []const u8 {
    return switch (up) {
        .jack => "climbs the beanstalk",
        .fee... .fum => "smells the blood of an Englishman",
        .beanstalk => "slides down again",
    };
}

I put a space in .fee....fum to avoid, well, just look at it. The parser could require that space, this is admittedly a regrettable syntax collision.

I would think that this wouldn't work for all enums, like the one from the documentation:

const Value3 = enum(u4) {
    a,
    b = 8,
    c,
    d = 4,
    e,
};

The compiler could error out here with something like "enum used in a range must be monotonic". Three possible rules, from strictest to least, would be a) "only 0...n enums can have ranged switch", b) "only n...m enums can have ranged switch (an initial value is ok)", c) "enums with increasing values can have ranged switch". The b) rule seems like the right balance between complexity and freedom.

There's a rule d) option, "enums can have ranged switch, in the order they're defined, regardless of their underlying integer value", but this seems bad.

In the original proposal, what's the compiler expected to do with this case:

const Value3 = enum(u4) {
    a,
    pub const b_first: Value3 = .b;
    b = 8,
    c,
    d = 4,
    pub const d_last: Value3 = .d;
    e,
};

For the range Value3.b_first ... Value3.d_last?

mnemnion avatar May 16 '24 23:05 mnemnion

@mnemnion In short, your proposal adds a needless restriction. The case items in switch are already allowed to be arbitrary comptime-known values; there is no reason to remove that restriction, and it is beneficial for precisely the use case given in the original post.

[...] what's the compiler expected to do with this case:

Emit a compile error, just as it would for .b ... .d, or indeed for 8...4 on an integer switch. (Equivalently, you can write const b_first = 8; const d_last = 4; today, and use it as a switch range on an integer, to get an error.) All of the semantics here are trivial extensions of existing semantics.

mlugg avatar May 16 '24 23:05 mlugg

Having run into an error when defining tagged unions out of order from their field declarations, I had wrongly assumed that enum switch statements also needed to be in order, with gaps allowed to be handled by an else. So sure @mlugg, I agree the restrictions in a) through c) are unnecessary, and d) would be the most reasonable approach.

The compiler is aware of both the declaration order and the value order, so one could simply define a range of enums as from one tag to any later tag in the field order, so this would be legal:

const Value3 = enum(u4) {
    a,
    b = 8,
    c = 12,
    d = 4,
    e = 9,
};

const v3 = Value3.c;

_ = switch (v3) {
     .e => "e",
     .b... .d => "bcd",
     .a => "a",
};

Right? Seems clear enough, and since enum definitions range from "no numbers provided" to "some numbers provided" to "every number provided", it would be pretty confusing if .b... .d was .b, .e, .d. Switching on the integer value of the enum is always an option if that's what someone wants.

I'm not seeing what defining specific constants within the enum that define ranges gets, relative to just allowing ranges on the enum in its declaration order. It doesn't help that .noreturn_type isn't actually in the Index enum as listed, but plays a central role in the function which switches on it. I'm sure that's a typo/copypasta error but without knowing the placement of that type in the enum definition it's hard to be confident that just direct ranges on the enum values wouldn't work.

I think from looking at it that .noreturn_type belongs between the const last_type and the const first_value so the switch prongs could be

switch (ty.ip_index) {
   .u1_type... .empty_struct_type => ...,
   .noreturn_type => ...,
   .undef... .empty_struct => ...,
   .generic_poison => ...,
   .none => ...,
}

Indeed, since the enum in question uses default values, there's no difference between the integer order and the declaration order. I'm not seeing what the pub const declarations in the enum are adding here, and I can readily imagine cases where different switches would have different ranges: in the VM I'm writing, the opcodes are grouped by function, and there are cases where one range of them might receive uniform treatment (inline range clearly) but other ranges need distinct logic. And vice versa.

And sure, I could add pub const fields defining those ranges, and in my case they wouldn't even overlap, but it seems surplus to requirements.

Is it so that the range is always from before and after .noreturn_type? Because you could get that as .u1_type ... @intToEnum(@enumToInt(.noreturn_type) - 1), that's all comptime, so the compiler can figure out if the result is a valid declared-order enum range or not. I don't think the switch could be compiled if the subtraction were a runtime-known value, so we can rule that out, surely.

The declared order seems like the only sensible way to define an enum range to me. Specifically, this:

[...] what's the compiler expected to do with this case:

Emit a compile error, just as it would for .b ... .d, or indeed for 8...4 on an integer switch. (Equivalently, you can write const b_first = 8; const d_last = 4; today, and use it as a switch range on an integer, to get an error.) All of the semantics here are trivial extensions of existing semantics.

Seems like the confusing and wrong thing to do. In effect that would be type punning on the enums tags and their values. If you switch on tags, ranges should be tag order, if you switch (@enumToInt(anEnum)) then you get to switch on values, and ranges would work the way they already do for integers.

mnemnion avatar May 17 '24 02:05 mnemnion

Throwing an idea here, wouldn't it make more sense to be able to declare enum subsets?

const AnEnum = enum {
    a,
    b,
    ASubset {
        c,
        d,
    },
    e,

    const AnotherSubset = enum (AnEnum) {
        a,
        e,
    };
};
const a_value = AnEnum.a;
const c_value = AnEnum.c;
const another_c_value = AnEnum.ASubset.c;
const another_a_value = AnEnum.AnotherSubset.a;
switch (any_value) {
    .a => {},
    .ASubset => {},
    else => {},
}

(Not quite sure about the syntax, but the idea is here.)

This would solve this issue, while also being able to have "ranges" that are, in fact, not continuous in the enum's declaration.

Lzard avatar May 17 '24 07:05 Lzard

@Lzard enum subsets seem like a good proposal, but maybe not this proposal. There are a lot of questions to answer, all of which seem parallel to the intention here: can a variable be a subset? Can subsets be declared outside of the enum? Can you upcast a subset in a switch statement, if so, how, @as ? Can you downcast in a switch, and is that an error, or an else branch, if the enum value isn't in the subset? This issue doesn't seem like the place to address all of that. It does strike me as a useful feature to at least consider, though there may be prior art (as in, already considered and rejected).

mnemnion avatar May 17 '24 16:05 mnemnion