zig icon indicating copy to clipboard operation
zig copied to clipboard

design flaw: semantics of loading values with undefined bits

Open mlugg opened this issue 2 months ago • 3 comments

Whilst working on #19630, it was suggested by @andrewrk to introduce the semantic rule that when loading a value from a pointer, if any bits are undefined, the entire value is considered to be undefined. This rule simplifies the language considerably, because it avoids the need to deal with the awkward concept of a comptime-known value with some undefined bits. However, it causes breakages in some code.

The main example I've noticed is std.PackedIntArray, which uses a backing buffer to store a sequence of integer values bit-packed in memory. The problem is that if we initialize the buffer to undefined, any load which crosses a boundary onto an adjacent element -- even if the element being loaded was set previously -- is considered undefined. Thus, the only way to obey the new semantics is to always @memset the buffer on creation, which is potentially wasteful.

This issue serves to track this design complexity and potential solutions to it. The example I brought up above could potentially be solved with runtime bit-pointers (related: #16271), but I can't say for sure whether that would solve all potential problems.

For now, this rule is never enforced at runtime, only at comptime. Workarounds have been placed in the standard library where necessary behind @inComptime() checks (such checks link to this issue).

mlugg avatar Apr 12 '24 20:04 mlugg

There's also some occurrences of what's definitely a bug: after a store, the value is turned back into its "true" type for our comptime representation. So this code fails:

test {
    comptime var x: u32 = undefined;
    const buf: *[4]u8 = @ptrCast(x);
    buf[0] = 123;
    // The byte is changed, but the value is then read back as a `u32`, so becomes `undefined`!
    // So, this check fails:
    try @import("std").testing.expectEqual(123, buf[0]);
}

This will be solved by adding a representation to MutableValue for a "bundle of bytes" -- or, more accurately, a set of bit-packed values of arbitrary types. This representation will only be applicable to types which are not comptime-only.

mlugg avatar Apr 16 '24 15:04 mlugg

The main example I've noticed is std.PackedIntArray, which uses a backing buffer to store a sequence of integer values bit-packed in memory. The problem is that if we initialize the buffer to undefined, any load which crosses a boundary onto an adjacent element -- even if the element being loaded was set previously -- is considered undefined. Thus, the only way to obey the new semantics is to always @memset the buffer on creation, which is potentially wasteful.

Not sure if it's a good idea or not, but my first thought as to a possible solution:

diff --git a/lib/std/packed_int_array.zig b/lib/std/packed_int_array.zig
index 02c721e7cf..6c8243d9c5 100644
--- a/lib/std/packed_int_array.zig
+++ b/lib/std/packed_int_array.zig
@@ -123,7 +123,7 @@ pub fn PackedIntIo(comptime Int: type, comptime endian: Endian) type {
 
             //read existing bytes
             const target_ptr: *align(1) Container = @ptrCast(&bytes[start_byte]);
-            var target = target_ptr.*;
+            var target = @freeze(target_ptr.*);
 
             if (endian != native_endian) target = @byteSwap(target);
 

jacobly0 avatar Apr 16 '24 17:04 jacobly0

related: https://github.com/ziglang/zig/pull/18137

xxxbxxx avatar Apr 24 '24 05:04 xxxbxxx