zig
zig copied to clipboard
Enum Arrays
This was originally proposed by @raulgrell here: https://github.com/zig-lang/zig/issues/770#issuecomment-368323984
What do you think of an enum array? It has a length equal to the member count of the enum and can only be indexed with an enum value.
const Axis = enum { X, Y, Z};
const vec3 = [Axis]f32 {0.0, 0.0, 0.0};
vec3[Axis.X] == 0.0;
You can get close to this with status quo Zig by specifying the tag type and casting
const Value = enum(u2) { Zero, One, Two };
const vals = [3]i32{3, 4, 5};
vals[u2(Value.Zero)] == 3;
But then if you change the number of elements, the backing type of the enum or override the values, you need to change a lot of code. And you could still access it with arbitrary integers, so if at any point the index into the array was hardcoded, it would have to be found.
An enum array basically becomes a comptime-checked map!
It could be approximately implemented in userland with something like this if we had a memberIndex built-in or something:
fn EnumArray(comptime T: type, comptime U: type) -> type {
return struct {
data: [@memberCount(T)]U,
const Self = this;
fn get(&self: Self, tag: t) U {
return data[@memberIndex(t)];
}
fn set(&self: Self, tag: t, value u) void {
data[@memberIndex(t)] = u;
}
}
}
const value_map = EnumArray(Value, i32) {
.data = []i32{3, 4, 5}
}
value_map.get(Value.Zero) == 3;
value_map.set(Value.Two, 10);
I use enum arrays frequently in C++. Here are some examples:
https://github.com/thejoshwolfe/legend-of-swarkland/blob/46b1916af06e9d7f6829cb6ff552049e5cd38f28/src/thing.hpp#L196
https://github.com/thejoshwolfe/legend-of-swarkland/blob/46b1916af06e9d7f6829cb6ff552049e5cd38f28/src/serial.cpp#L389
The silly hacks (like check_indexed_array()) I do there to make sure I'm doing it right are begging for a language feature. I think this proposal has a lot of value.
Would we have a [Enum]null T too? What would you expect to get in a for? Would we expose the tag? the index? both?
const Axis = enum { X, Y, Z};
const vec3 = [Axis]f32 {0.0, 0.0, 0.0};
vec3[Axis.X] == 0.0;
for (vec3) | val, tag | println("{}: {}", val, tag);
for (vec3) | val, index | println("{}: {}", val, index);
for (vec3) | val, index, tag | println("{}: {}", val, index);
This would be a nice addition for me. I used this approach quite frequently for a non-dynamically allocated game which relied heavily on tables for data and looking up by id. I actually found that when trying to represent similar ideas in rust using enums it was pretty rough going and I switched to just using integer values for the cost of some compile-time safety.
Regarding iteration, would it suffice to just have the following:
for (vec3) |val, tag| {
const value = usize(tag)
}
This also keeps the type returned between normal arrays and enum arrays consistent (item, index_type).
I would expect you would far more often than not use the tag for indexing associated data. There is the argument that you would need to know the appropriate integer size to cast to but if you wanted to do math or other things on the index, I would think you would probably have a good idea on what to pick here.
I would love this feature as well, but how restricted should this feature be? Is this allowed?
const E = enum { First = 1, Last = 1000 };
const arr = [E]i64 { -1, 1 };
My guess is no, but what about a more reasonable example?
const E = enum { First = 5, Second = 6, Last = 7 };
const arr = [E]i64 { -1, 0, 1 };
Here, the compiler would be able to give us an array of size 3, and when indexing, subtract with the 5 offset.
EDIT:
Just reread the proposal. I don't think this @memberIndex(t) exists (it's not in the docs) but if it does, I guess my first example could be allowed.
@Hejsil both examples would work, even without the memberIndex function. You'd have to specify the tag type though.
The following examples include memberIndex and memberTag builtins only for illustrative purposes.
const E = enum(u8) { First = 1, Second = 2, Third = 3, Last = 100 };
const arr = [E]i64 { -1, -2, -3, -100 };
// -1, E.First, 0, 1
// -2, E.Second, 1, 2
// -3, E.Third, 2, 3
// -100, E.Last, 3, 100
for(arr) | val, tag | {
warn("{}, {}, {}, {}", val, tag, @memberIndex(tag), u8(tag);
}
// Would slicing an enum array result in a normal slice or would we have enum slices?
// How would we refer to other parts of the array?
// Enum slice
arr[E.First .. E.Third] == [E.First .. E.Third]i64 {-1, -2, -3};
for(arr[E.First .. E.Third]) | val, tag | {
warn("{}, {}, {}, {}", val, tag, @memberIndex(tag), u8(tag);
const next_tag = @memberTag(@memberIndex(tag) + 1);
warn("{}, {}, {}, {}", arr[next_tag], @memberIndex(tag) + 1, next_tag, u8(next_tag);
}
// Regular slice
arr[E.First .. E.Third] == []i64 {-1, -2, -3};
for(arr[E.First .. E.Third]) | val, index | {
const curr_tag = @memberTag(E, index);
warn("{}, {}, {}, {}", val, index, curr_tag, u8(curr_tag);
const next_tag = @memberTag(E, index + 1)
warn("{}, {}, {}, {}", arr[next_tag], index + 1, next_tag, u8(next_tag);
}
I'm not sure how I feel about it. Ideally, it would provide a third binding:
// Both
for(arr) | val, index, tag | {
warn("{}, {}, {}, {}", val, tag, index, u8(tag);
}
for(arr[E.First .. E.Third]) | val, index, tag | {
warn("{}, {}, {}, {}", val, index, tag, u8(tag);
const next_tag = @memberTag(E, index + 1)
warn("{}, {}, {}, {}", arr[next_tag], index + 1, next_tag, u8(next_tag);
}
See also: #358
While we're at it, we should consider switch statements too. See also: #359
const E = enum(u8) { First = 1, Second = 2, Third = 3, Last = 100 };
const e = E.Third;
switch (e) {
E.First => "Head",
E.Second ... E.Last => "Tail"
}
Well, as long as:
const E = enum(u8) { A = 0, B = 1, C = 2 };
const arr = [E]i64{ -1, 0, 1 };
arr[E.A];
compiles down to the same as:
const E = enum(u8) { A = 0, B = 1, C = 2 };
const arr = [3]i64{ -1, 0, 1 };
arr[u8(E.A)];
Then I'm happy with the feature. I was just pointing out, that the compiler has to add a hidden cost to enum arrays when the enum tags do not map to array index directly.
As for the extra for "payload". I can't see the use case. Maybe we can add enum arrays into the language, see if any real code needs this extra payload, and then talk about it again.
@Hejsil Since the index of the member can be different to the value in the enum, I'd expect that to compile to:
const E = enum(u8) { A = 0, B = 1, C = 2 };
const arr = [3]i64{ -1, 0, 1 };
arr[@memberIndex(E.A)];
In your example, if you changed A to 10, you'd have an index out of bounds error, since the underlying array has length 3. You then wouldn't be able to refer to the first element of the array, since the array can only be indexed using an enum value, and it 'contains' an invalid index.
Then I'm happy with the feature. I was just pointing out, that the compiler has to add a hidden cost to enum arrays when the enum tags do not map to array index directly
All this would happen at compile time, the enum tag's @memberIndex would map directly to the array index, and we'd also guarantee that every enum array access is valid. No runtime cost with added safety!
It would be possible to do stuff like:
// Generate an array from an enum
const PowersOfTwo = enum(u32) { One = 1, Two = 2, Four = 4, Eight = 8 };
const powers_of_two: [E]u32 = undefined;
for (arr) | *val, index, tag | *val = u32(tag);
@raulgrell I think you misunderstood me. I meant if the enum is a direct mapping to array indexes, then the compiler should output equivalent code and I would be happy.
Wait, so enum arrays are only useful for compile-time known indexes? What about user input?
const E = enum { A = 2, B = 100 };
var arr = [E]i64 { 0, 1 };
arr[try strToE(some_user_string)];
In this case, the compiler has to output code that can do the mapping from enum -> array index.
I had only really considered compile-time use, but that's a really good point.
Yeah, in this situation it's not really a zero-cost abstraction, but if you were using this kind of pattern without language support you'd have to do the mapping anyway. So yeah, either way I'm happy.
Hmmm. Here is another question. Is there any value in having an enum array be ordered by tag declaration order?
const A = enum { A = 2, B = 1, C = 0 };
const B = enum { A = 0, B = 1, C = 2 };
// 0 2, 1 1, 2 0,
for ([A]u8 { 0, 1, 2 }) |i, tag| { debug.warn("{} {}, ", i, u8(tag)); }
// 0 0, 1 1, 2 2,
for ([B]u8 { 0, 1, 2 }) |i, tag| { debug.warn("{} {}, ", i, u8(tag)); }
If so, then indexing [A]u8 is slower than indexing [B]u8. If we let enum arrays be unordered, then the compiler is free to optimize [A]u8 to have the same data layout as [B]u8.
Also, maybe this feature requires special initialization syntax:
[A]u8 { .A = 0, .B = 1, .C = 2 }
This ensures that if someone reorders members of A, then the program doesn't break.
In terms of ordering, that's not really an issue - I'm not sure that indexing [A]u8 is any different to indexing [B]u8. I'm still not sure we're on the same page. Could you make an example where the tag value has no relation to the indices, and the child type of the enum array is of a non-indexing type like f32?
Also, maybe this feature requires special initialization syntax. This ensures that if someone reorders members of A, then the program doesn't break.
This is a great suggestion. Somewhere in the documentation there is mention of both enums and packed enums.
// Say the compiler reorders the enum fields by their tag value
const E = enum(u8) { A = 12, B = 2, C = 6}; // @memberIndex(E.A) == 2
// You'd have to initialize it like @Heijsil suggested
const e = [E]f32 { .A = 0.66, .B = 22.1, .C = 42.0 };
// The compiler wouldn't do that in a packed enum
const PE = packed enum(u8) { A = 10, B =12, C =3 }; // @memberIndex(E.A) == 0
// And we could thus initialize it like an array
const pe = [PE]f32 { 0.66, 22.1, 42.0 };
// Structs
const S = struct { a: u8, b: u16, c: u8 }; // @memberIndex(S.b) == 2
const PS = packed struct { a: u8, b: u16, c: u8 }; // @memberIndex(S.b) == 1
// Is this restriction on initialization already the case with structs?
const s = S { .A = 10, .B = 257, .C = 8.};
const ps = PS {10, 257, 8 };
Hmmmm. Just look up the packed enums and they're just TODO'ed in the docs, so I have no idea what they are used for. Maybe they'll get endian properties like structs, discussed in #649?
I like the idea of compiler reordering of enums but not packed enums. It has this nice consistency with structs. That also gives the option for ordered vs unordered enum arrays, not that I have a real use case for ordered enum arrays.
Actually, order matters if you intend to ever treat your enum array as a normal array.
const A = enum { A = 2, B = 1, C = 0 };
const arra = [A]u8 { .A = 5, .B = 3, .C = 27 };
const B = packed enum { A = 2, B = 1, C = 0 };
const arrb = [B]u8 { .A = 5, .B = 3, .C = 27 };
// []u8{ 27, 3, 5 }
const bytesa = ([]u8)(arra);
// []u8{ 5, 3, 27 }
const bytesb = ([]u8)(arrb);
This is useful for shared memory and stuff like that.
@memberIndex can be done in userland now:
fn memberIndex(comptime T: type, tag: T) usize {
const enum_info = @typeInfo(T);
const enum_fields = enum_info.Enum.fields;
inline for (enum_fields) |field, i| {
if (std.mem.eql(u8, field.name, @tagName(tag))) {
return i;
}
}
unreachable;
}
test "memberIndex" {
const Value = enum(u2) {
Zero,
One,
Two,
Three,
};
std.debug.warn("\n{}\n", memberIndex(Value, .Three));
std.debug.warn("{}\n", memberIndex(Value, .One));
std.debug.warn("{}\n", memberIndex(Value, .Two));
}
Output:
3
1
2
EnumArray:
fn EnumArray(comptime T: type, comptime U: type) type {
return struct {
data: [@memberCount(T)]U,
fn get(self: *const @This(), tag: T) U {
return self.data[memberIndex(T, tag)];
}
fn set(self: *@This(), tag: T, value: U) void {
self.data[memberIndex(T, tag)] = value;
}
};
}
test "EnumArray" {
const Axis = enum {
X,
Y,
Z,
};
var vec3 = EnumArray(Axis, f32){ .data = undefined };
vec3.set(.X, 0.54);
vec3.set(.Y, -0.21);
vec3.set(.Z, 0.55);
std.debug.warn("\n{}\n", vec3.get(.X));
std.debug.warn("{}\n", vec3.get(.Y));
std.debug.warn("{}\n", vec3.get(.Z));
}
Note that a branching penalty in memberIndex will be incurred if tag is not comptime-known. This can be eliminated by using a better data structure in memberIndex e.g. comptime-built hash table with a perfect hash function.
I think it might be reasonable to restrict this proposal to untagged enums, or tagged enums that don't override any ordinal values.
The use case for an enum array is (as far as I understand), to give names to the indexes of an array. The use case for specifying the ordinals of an enum is giving names to specific ordinal values. If those two cases ever overlap, you could actually use a constant enum array to look up the custom ordinal value. For example:
const E = enum {
hundred,
thousand,
million,
};
const ord_E = [Value]u32{
hundred = 100,
thousand = 1000,
million = 1000000,
};
test "access enum ordinal value" {
try expect(ord_E[.hundred] == 100);
try expect(ord_E[.thousand] == 1000);
try expect(ord_E[.million] == 1000000);
}
This is slightly more verbose, but I don't think that's necessarily a bad thing in this case. Using an enum with overridden ordinal values implies more overhead, and this makes that overhead explicit. In addition, the extra verbosity is only when defining the enum, because accessing ord_E is about the same amount of code as calling @enumToInt.
Considering the discussion on enum slices, I'm not sure they're a good idea. Enum tags do not necessarily have a strong ordering, so ranges between them could be confusing and fragile. If you have an enum called Color, what does colors[Color.Blue .. Color.Red] mean? Will it stay the same if someone comes along and decides the enum tags would be better ordered alphabetically rather than by the color wheel?
For similar reasons, I think it's a good idea to require struct initialisation syntax for enum arrays.
@MageJohn After reading your suggestions, a question occurs to me: would it be better to treat this concept as a struct with uniform field types, as opposed to an array? Because thinking about it, if all an Enum-Indexed array would do is give names to offsets in a series of values, that's pretty much what structs are for, with the differences being:
- There isn't any enforcement for a struct to have uniform field types.
- You can't associate a numerical value with the struct's field names.
Though, these appear to be relatively trivial to resolve, compared to figuring out what should and shouldn't be allowed in Enum-Indexed arrays, imo. One approach for the first issue would be introducing something like array struct or uniform struct; and one approach for the second issue would be something to the tune of 'tagged structs': const S = struct(Enum) {};
I don't want to derail discussion here too much, in case this idea should be considered separately, or if it's already been considered, so I'll leave it at that, but it's something to consider.
@InKryption I think that's worth discussing, yeah. Implemented correctly, it could neatly sidestep some of the issues discussed so far by reframing the concept. I think there are a few problems though, which I can't see easy solutions for.
At the conceptual level, I'm not sure it maps as neatly onto existing concepts as an enum array. In my mental model the fundamental difference between a struct and an array is that an array is uniform while a struct isn't; this model breaks in the face of uniform struct, but I'm not sure it should because they would probably still compile down to arrays.
There are also questions of access syntax. An use case of tagged structs/enum arrays over structs would be access by runtime known values, which doesn't really work with dot access (e.g. foo.name). The obvious alternative is array access (foo[i]), but if they're supposed to be a special type of struct then using array access could be confusing.
One use case for an enum array is to comptime-compute maximum buffers size and use the enum to point to offsets into the buffer for returning the according slice or (better?) generate at comptime an array.
What I think is missing in the proposal is to clarify if the enum is integer backed, since one could replace it with the offset address into whatever data is pointed to.
Further, I am missing in the proposal, if and how contiguous the pointed data from the enum should be.
If the data itself is contiguous and the backed integer is not necessary (as space waste), then one should be able to implement the comptime-computation of the slice, since one can generalize over the enum fields at comptime.
Can you describe your use cases for non-contiguous data? Does anybody have data on how much performance one can expect from field reordering at maximum/average etc? I would split the backing buffer, if only on a few fields are hot/frequently accessed or manually change the order.
Sketch of implementation with current capabilities of the language:
comptime construction
input:
- array of (enum field, string) ie as array of struct fields
(any other structure that can be comptime-traversed would do)
- backed enum integers being continuous increasing (for simplicity starting at 0)
output:
- concatenated string
- array with (offsets,len) into string
usage:
buf[offsets[2*@enumToInt(enum)..2*@enumToInt(enum)+1]]
As of now, there is currently a miscompilation, which prevents things from working smoothly: https://gist.github.com/matu3ba/07ea081b07639e7ecceb13173083484f
posted in this issue #10920
Isn't this already done? https://ziglang.org/documentation/master/std/#root;enums.EnumArray
@Pyrolistical So far EnumArray is only an an enum available as indexed array. I would close this once indirect indexing at comptime works and has tests.