zig Reassigning union trips inactive field check

Broken by assignment-result-loc.

const U = union {
    A: u32,
    B: u32,
};

test "" {
    var a = U{ .A = 32 };
    a = U{ .B = a.A };
}

Sep 14 '19 21:09 Vexu

OK so this is a tricky issue. This code

const U = union {
    A: u32,
    B: u32,
};

test "" {
    var a = U{ .A = 32 };
    a = U{ .B = 99 };
}

works because it does not reference itself. What's happening here is that there is no intermediate copy. With a = b;, the result location of b is a. So, with

    a = U{ .B = a.A };

First, at a = U{. B = , Zig sets the "active field" to B. Next, it tries to read a.A into the result location, but it's too late - the "active field" is already B.

I do think this should work, for the same reason that a = a + 1; works.

Sep 16 '19 23:09 andrewrk

if you model a = a + b as this function:

fn add(left: *const u32, right: *const u32, result: *u32) void {
    ...
}

add(&a, &b, &a);

then even a = a + b violates some pointer aliasing assumptions (that i forget if zig had or doesn't have), which is significant to the semantics allowable by an optimizer.

The case of primitive addition is not going to run into any real trouble with pointer aliasing, but the union case you gave is more serious.

I'm not sure that we want that example to work. I can imagine more complex examples that makes it impossible to detect at compile time.

Sep 17 '19 20:09 thejoshwolfe

While considering the general issue extended to structs, I came to the conclusion that the "obvious semantics" of the right hand side being fully evaluated before an atomic assignment (if that is the expected null hypothesis) will require a hidden copy for result location semantics - example. Probably worth its own issue if there exist differing views in that regard.

But specifically targetting this use case of reassigning a union, I think a simple reordering of operations would suffice:

current behaviour: (compute and) assign tag field followed by compute and assign payload
desired behaviour: compute and assign payload followed by (compute and) assign tag field (expected not to depend on previous tag field - @unionInit (within comptime execution) might require special attention)
If both tag and payload sections are allowed to depend on each other, it's probably easier to construct a copy of the smaller half (in most cases the tag field, I'd assume?), and overwrite that section first.

Without a full hidden copy of the previous value, this will however complicate variable-size-tag optimization (pretty sure Zig doesn't have this yet?), e.g.: U = packed union {a: u6, b: u6, c: u7}; // could fit into a single byte with huffman-ish hierarchical tag value i.e. packed struct {is_c: u1, is_a: u1} In this scenario, if tag and payload are not disjoint, overwriting either half would be unsafe.

Sep 18 '19 15:09 rohlem