zig Enum Arrays

trafficstars

This was originally proposed by @raulgrell here: https://github.com/zig-lang/zig/issues/770#issuecomment-368323984

What do you think of an enum array? It has a length equal to the member count of the enum and can only be indexed with an enum value.

const Axis = enum { X, Y, Z};
const vec3 = [Axis]f32 {0.0, 0.0, 0.0};
vec3[Axis.X] == 0.0;

You can get close to this with status quo Zig by specifying the tag type and casting

const Value = enum(u2) {  Zero, One, Two };
const vals = [3]i32{3, 4, 5};
vals[u2(Value.Zero)] == 3;

But then if you change the number of elements, the backing type of the enum or override the values, you need to change a lot of code. And you could still access it with arbitrary integers, so if at any point the index into the array was hardcoded, it would have to be found.

An enum array basically becomes a comptime-checked map!

It could be approximately implemented in userland with something like this if we had a memberIndex built-in or something:

fn EnumArray(comptime T: type, comptime U: type) -> type {
    return struct {
        data: [@memberCount(T)]U,

        const Self = this;

        fn get(&self: Self, tag: t) U {
            return data[@memberIndex(t)];
        }

        fn set(&self: Self, tag: t, value u) void {
            data[@memberIndex(t)] = u;
        }
    }
}

const value_map = EnumArray(Value, i32) {
    .data = []i32{3, 4, 5}
}

value_map.get(Value.Zero) == 3;
value_map.set(Value.Two, 10);

Feb 25 '18 17:02 thejoshwolfe

I use enum arrays frequently in C++. Here are some examples:

https://github.com/thejoshwolfe/legend-of-swarkland/blob/46b1916af06e9d7f6829cb6ff552049e5cd38f28/src/thing.hpp#L196

https://github.com/thejoshwolfe/legend-of-swarkland/blob/46b1916af06e9d7f6829cb6ff552049e5cd38f28/src/serial.cpp#L389

The silly hacks (like check_indexed_array()) I do there to make sure I'm doing it right are begging for a language feature. I think this proposal has a lot of value.

Feb 25 '18 17:02 thejoshwolfe

Would we have a [Enum]null T too? What would you expect to get in a for? Would we expose the tag? the index? both?

const Axis = enum { X, Y, Z};
const vec3 = [Axis]f32 {0.0, 0.0, 0.0};
vec3[Axis.X] == 0.0;

for (vec3) | val, tag | println("{}: {}", val, tag);
for (vec3) | val, index | println("{}: {}", val, index);
for (vec3) | val, index, tag | println("{}: {}", val, index);

Feb 26 '18 00:02 raulgrell

This would be a nice addition for me. I used this approach quite frequently for a non-dynamically allocated game which relied heavily on tables for data and looking up by id. I actually found that when trying to represent similar ideas in rust using enums it was pretty rough going and I switched to just using integer values for the cost of some compile-time safety.

Regarding iteration, would it suffice to just have the following:

for (vec3) |val, tag| {
    const value = usize(tag)
}

This also keeps the type returned between normal arrays and enum arrays consistent (item, index_type).

I would expect you would far more often than not use the tag for indexing associated data. There is the argument that you would need to know the appropriate integer size to cast to but if you wanted to do math or other things on the index, I would think you would probably have a good idea on what to pick here.

Feb 27 '18 08:02 tiehuis

I would love this feature as well, but how restricted should this feature be? Is this allowed?

const E = enum { First = 1, Last = 1000 };
const arr = [E]i64 { -1, 1 };

My guess is no, but what about a more reasonable example?

const E = enum { First = 5, Second = 6, Last = 7 };
const arr = [E]i64 { -1, 0, 1 };

Here, the compiler would be able to give us an array of size 3, and when indexing, subtract with the 5 offset.

EDIT: Just reread the proposal. I don't think this @memberIndex(t) exists (it's not in the docs) but if it does, I guess my first example could be allowed.

Feb 27 '18 09:02 Hejsil

@Hejsil both examples would work, even without the memberIndex function. You'd have to specify the tag type though.

The following examples include memberIndex and memberTag builtins only for illustrative purposes.

const E = enum(u8) { First = 1, Second = 2, Third = 3, Last = 100 };
const arr = [E]i64 { -1, -2, -3, -100 };

// -1, E.First, 0, 1
// -2, E.Second, 1, 2
// -3, E.Third, 2, 3
// -100, E.Last, 3, 100
for(arr) | val, tag | {
    warn("{}, {}, {}, {}", val, tag, @memberIndex(tag), u8(tag);
}

// Would slicing an enum array result in a normal slice or would we have enum slices?
// How would we refer to other parts of the array?

// Enum slice
arr[E.First .. E.Third] == [E.First .. E.Third]i64 {-1, -2, -3};

for(arr[E.First .. E.Third]) | val, tag | {
    warn("{}, {}, {}, {}", val, tag, @memberIndex(tag), u8(tag);
   
    const next_tag = @memberTag(@memberIndex(tag) + 1);
    warn("{}, {}, {}, {}", arr[next_tag], @memberIndex(tag) + 1, next_tag, u8(next_tag);
}

// Regular slice
arr[E.First .. E.Third] == []i64 {-1, -2, -3};

for(arr[E.First .. E.Third]) | val, index | {
    const curr_tag = @memberTag(E, index);
    warn("{}, {}, {}, {}", val, index, curr_tag, u8(curr_tag);

    const next_tag = @memberTag(E, index + 1)
    warn("{}, {}, {}, {}", arr[next_tag], index + 1, next_tag, u8(next_tag);
}

I'm not sure how I feel about it. Ideally, it would provide a third binding:

// Both
for(arr) | val, index, tag | {
    warn("{}, {}, {}, {}", val, tag, index, u8(tag);
}

for(arr[E.First .. E.Third]) | val, index, tag | {
    warn("{}, {}, {}, {}", val, index, tag, u8(tag);

    const next_tag = @memberTag(E, index + 1)
    warn("{}, {}, {}, {}", arr[next_tag], index + 1, next_tag, u8(next_tag);
}

See also: #358

Feb 27 '18 10:02 raulgrell

While we're at it, we should consider switch statements too. See also: #359

const E = enum(u8) { First = 1, Second = 2, Third = 3, Last = 100 };
const e = E.Third;

switch (e) {
    E.First => "Head",
    E.Second ... E.Last => "Tail"
}

Feb 27 '18 11:02 raulgrell

Well, as long as:

const E = enum(u8) { A = 0, B = 1, C = 2 };
const arr = [E]i64{ -1, 0, 1 };
arr[E.A];

compiles down to the same as:

const E = enum(u8) { A = 0, B = 1, C = 2 };
const arr = [3]i64{ -1, 0, 1 };
arr[u8(E.A)];

Then I'm happy with the feature. I was just pointing out, that the compiler has to add a hidden cost to enum arrays when the enum tags do not map to array index directly.

Feb 27 '18 12:02 Hejsil

As for the extra for "payload". I can't see the use case. Maybe we can add enum arrays into the language, see if any real code needs this extra payload, and then talk about it again.

Feb 27 '18 12:02 Hejsil

@Hejsil Since the index of the member can be different to the value in the enum, I'd expect that to compile to:

const E = enum(u8) { A = 0, B = 1, C = 2 };
const arr = [3]i64{ -1, 0, 1 };
arr[@memberIndex(E.A)];

In your example, if you changed A to 10, you'd have an index out of bounds error, since the underlying array has length 3. You then wouldn't be able to refer to the first element of the array, since the array can only be indexed using an enum value, and it 'contains' an invalid index.

Then I'm happy with the feature. I was just pointing out, that the compiler has to add a hidden cost to enum arrays when the enum tags do not map to array index directly

All this would happen at compile time, the enum tag's @memberIndex would map directly to the array index, and we'd also guarantee that every enum array access is valid. No runtime cost with added safety!

It would be possible to do stuff like:

// Generate an array from an enum
const PowersOfTwo = enum(u32) { One = 1, Two = 2, Four = 4, Eight = 8 };
const powers_of_two: [E]u32 = undefined;
for (arr) | *val, index, tag | *val = u32(tag);

Feb 27 '18 13:02 raulgrell

@raulgrell I think you misunderstood me. I meant if the enum is a direct mapping to array indexes, then the compiler should output equivalent code and I would be happy.

Wait, so enum arrays are only useful for compile-time known indexes? What about user input?

const E = enum { A = 2, B = 100 };
var arr = [E]i64 { 0, 1 };
arr[try strToE(some_user_string)];

In this case, the compiler has to output code that can do the mapping from enum -> array index.

Feb 27 '18 13:02 Hejsil

I had only really considered compile-time use, but that's a really good point.

Yeah, in this situation it's not really a zero-cost abstraction, but if you were using this kind of pattern without language support you'd have to do the mapping anyway. So yeah, either way I'm happy.

Feb 27 '18 14:02 raulgrell

Hmmm. Here is another question. Is there any value in having an enum array be ordered by tag declaration order?

const A = enum { A = 2, B = 1, C = 0 };
const B = enum { A = 0, B = 1, C = 2 };

// 0 2, 1 1, 2 0, 
for ([A]u8 { 0, 1, 2 }) |i, tag| { debug.warn("{} {}, ", i, u8(tag)); }

// 0 0, 1 1, 2 2, 
for ([B]u8 { 0, 1, 2 }) |i, tag| { debug.warn("{} {}, ", i, u8(tag)); }

If so, then indexing [A]u8 is slower than indexing [B]u8. If we let enum arrays be unordered, then the compiler is free to optimize [A]u8 to have the same data layout as [B]u8.

Feb 27 '18 14:02 Hejsil

Also, maybe this feature requires special initialization syntax:

[A]u8 { .A = 0, .B = 1, .C = 2 }

This ensures that if someone reorders members of A, then the program doesn't break.

Feb 27 '18 14:02 Hejsil

In terms of ordering, that's not really an issue - I'm not sure that indexing [A]u8 is any different to indexing [B]u8. I'm still not sure we're on the same page. Could you make an example where the tag value has no relation to the indices, and the child type of the enum array is of a non-indexing type like f32?

Also, maybe this feature requires special initialization syntax. This ensures that if someone reorders members of A, then the program doesn't break.

This is a great suggestion. Somewhere in the documentation there is mention of both enums and packed enums.

// Say the compiler reorders the enum fields by their tag value
const E = enum(u8) { A = 12, B = 2, C = 6}; // @memberIndex(E.A) == 2
// You'd have to initialize it like @Heijsil suggested
const e = [E]f32 { .A = 0.66, .B = 22.1, .C = 42.0 };

// The compiler wouldn't do that in a packed enum
const PE = packed enum(u8) { A = 10, B =12, C =3 }; // @memberIndex(E.A) == 0
// And we could thus initialize it like an array
const pe = [PE]f32 { 0.66, 22.1, 42.0 };

// Structs
const S = struct { a: u8, b: u16, c: u8 }; // @memberIndex(S.b) == 2
const PS = packed struct { a: u8, b: u16, c: u8 }; // @memberIndex(S.b) == 1

// Is this restriction on initialization already the case with structs?
const s = S { .A = 10, .B = 257, .C = 8.};
const ps = PS {10, 257, 8  };

Feb 27 '18 15:02 raulgrell

Hmmmm. Just look up the packed enums and they're just TODO'ed in the docs, so I have no idea what they are used for. Maybe they'll get endian properties like structs, discussed in #649?

I like the idea of compiler reordering of enums but not packed enums. It has this nice consistency with structs. That also gives the option for ordered vs unordered enum arrays, not that I have a real use case for ordered enum arrays.

Feb 27 '18 16:02 Hejsil

Actually, order matters if you intend to ever treat your enum array as a normal array.

const A = enum { A = 2, B = 1, C = 0 };
const arra = [A]u8 { .A = 5, .B = 3, .C = 27 };

const B = packed enum { A = 2, B = 1, C = 0 };
const arrb = [B]u8 { .A = 5, .B = 3, .C = 27 };

// []u8{ 27, 3, 5 }
const bytesa = ([]u8)(arra);

// []u8{ 5, 3, 27 }
const bytesb = ([]u8)(arrb);

This is useful for shared memory and stuff like that.

Mar 07 '18 14:03 Hejsil

@memberIndex can be done in userland now:

fn memberIndex(comptime T: type, tag: T) usize {
    const enum_info = @typeInfo(T);
    const enum_fields = enum_info.Enum.fields;
    inline for (enum_fields) |field, i| {
        if (std.mem.eql(u8, field.name, @tagName(tag))) {
            return i;
        }
    }
    unreachable;
}

test "memberIndex" {
    const Value = enum(u2) {
        Zero,
        One,
        Two,
        Three,
    };

    std.debug.warn("\n{}\n", memberIndex(Value, .Three));
    std.debug.warn("{}\n", memberIndex(Value, .One));
    std.debug.warn("{}\n", memberIndex(Value, .Two));
}

Output:

3
1
2

Apr 14 '19 07:04 nodefish

EnumArray:

fn EnumArray(comptime T: type, comptime U: type) type {
    return struct {
        data: [@memberCount(T)]U,

        fn get(self: *const @This(), tag: T) U {
            return self.data[memberIndex(T, tag)];
        }

        fn set(self: *@This(), tag: T, value: U) void {
            self.data[memberIndex(T, tag)] = value;
        }
    };
}

test "EnumArray" {
    const Axis = enum {
        X,
        Y,
        Z,
    };
    var vec3 = EnumArray(Axis, f32){ .data = undefined };
    vec3.set(.X, 0.54);
    vec3.set(.Y, -0.21);
    vec3.set(.Z, 0.55);

    std.debug.warn("\n{}\n", vec3.get(.X));
    std.debug.warn("{}\n", vec3.get(.Y));
    std.debug.warn("{}\n", vec3.get(.Z));
}

Note that a branching penalty in memberIndex will be incurred if tag is not comptime-known. This can be eliminated by using a better data structure in memberIndex e.g. comptime-built hash table with a perfect hash function.

Apr 14 '19 07:04 nodefish

I think it might be reasonable to restrict this proposal to untagged enums, or tagged enums that don't override any ordinal values.

The use case for an enum array is (as far as I understand), to give names to the indexes of an array. The use case for specifying the ordinals of an enum is giving names to specific ordinal values. If those two cases ever overlap, you could actually use a constant enum array to look up the custom ordinal value. For example:

const E = enum {
    hundred,
    thousand,
    million,
};
const ord_E = [Value]u32{ 
    hundred = 100, 
    thousand = 1000, 
    million = 1000000,
};

test "access enum ordinal value" {
    try expect(ord_E[.hundred] == 100);
    try expect(ord_E[.thousand] == 1000);
    try expect(ord_E[.million] == 1000000);
}

This is slightly more verbose, but I don't think that's necessarily a bad thing in this case. Using an enum with overridden ordinal values implies more overhead, and this makes that overhead explicit. In addition, the extra verbosity is only when defining the enum, because accessing ord_E is about the same amount of code as calling @enumToInt.

Dec 15 '21 16:12 MageJohn

Considering the discussion on enum slices, I'm not sure they're a good idea. Enum tags do not necessarily have a strong ordering, so ranges between them could be confusing and fragile. If you have an enum called Color, what does colors[Color.Blue .. Color.Red] mean? Will it stay the same if someone comes along and decides the enum tags would be better ordered alphabetically rather than by the color wheel?

For similar reasons, I think it's a good idea to require struct initialisation syntax for enum arrays.

Dec 15 '21 16:12 MageJohn

@MageJohn After reading your suggestions, a question occurs to me: would it be better to treat this concept as a struct with uniform field types, as opposed to an array? Because thinking about it, if all an Enum-Indexed array would do is give names to offsets in a series of values, that's pretty much what structs are for, with the differences being:

There isn't any enforcement for a struct to have uniform field types.
You can't associate a numerical value with the struct's field names.

Though, these appear to be relatively trivial to resolve, compared to figuring out what should and shouldn't be allowed in Enum-Indexed arrays, imo. One approach for the first issue would be introducing something like array struct or uniform struct; and one approach for the second issue would be something to the tune of 'tagged structs': const S = struct(Enum) {};

I don't want to derail discussion here too much, in case this idea should be considered separately, or if it's already been considered, so I'll leave it at that, but it's something to consider.

Dec 15 '21 17:12 InKryption

@InKryption I think that's worth discussing, yeah. Implemented correctly, it could neatly sidestep some of the issues discussed so far by reframing the concept. I think there are a few problems though, which I can't see easy solutions for.

At the conceptual level, I'm not sure it maps as neatly onto existing concepts as an enum array. In my mental model the fundamental difference between a struct and an array is that an array is uniform while a struct isn't; this model breaks in the face of uniform struct, but I'm not sure it should because they would probably still compile down to arrays.

There are also questions of access syntax. An use case of tagged structs/enum arrays over structs would be access by runtime known values, which doesn't really work with dot access (e.g. foo.name). The obvious alternative is array access (foo[i]), but if they're supposed to be a special type of struct then using array access could be confusing.

Dec 15 '21 22:12 MageJohn

One use case for an enum array is to comptime-compute maximum buffers size and use the enum to point to offsets into the buffer for returning the according slice or (better?) generate at comptime an array.

What I think is missing in the proposal is to clarify if the enum is integer backed, since one could replace it with the offset address into whatever data is pointed to. Further, I am missing in the proposal, if and how contiguous the pointed data from the enum should be.

If the data itself is contiguous and the backed integer is not necessary (as space waste), then one should be able to implement the comptime-computation of the slice, since one can generalize over the enum fields at comptime.

Can you describe your use cases for non-contiguous data? Does anybody have data on how much performance one can expect from field reordering at maximum/average etc? I would split the backing buffer, if only on a few fields are hot/frequently accessed or manually change the order.

Sketch of implementation with current capabilities of the language:

comptime construction

input:

array of (enum field, string) ie as array of struct fields (any other structure that can be comptime-traversed would do)
- backed enum integers being continuous increasing (for simplicity starting at 0)

output:

concatenated string
array with (offsets,len) into string usage: buf[offsets[2*@enumToInt(enum)..2*@enumToInt(enum)+1]]

As of now, there is currently a miscompilation, which prevents things from working smoothly: https://gist.github.com/matu3ba/07ea081b07639e7ecceb13173083484f

posted in this issue #10920

Feb 16 '22 14:02 matu3ba

Isn't this already done? https://ziglang.org/documentation/master/std/#root;enums.EnumArray

Nov 01 '22 17:11 Pyrolistical

@Pyrolistical So far EnumArray is only an an enum available as indexed array. I would close this once indirect indexing at comptime works and has tests.

Nov 02 '22 10:11 matu3ba

zig zig copied to clipboard

Enum Arrays

zig
zig copied to clipboard