zig icon indicating copy to clipboard operation
zig copied to clipboard

Proposal: initial `@bitCast` semantics (packed + vector + array)

Open jacobly0 opened this issue 10 months ago • 0 comments

This is a partial resurrection of #10547 with an initially reduced scope and taking into account the packed struct changes since then.

The status quo implementation of @bitSizeOf and @bitCast are inconsistent across different types and fairly unimplementable across the various backends. According to the original rejected proposal, @bitCast now has the semantics of loading from a @ptrCasted pointer, however this is plainly not true for status quo due to #17802 changing a @ptrCast to a @bitCast with the explicit goal of fixed undefined behavior (related to load/store sizes). I'm also not convinced that this would be a usable definition anyway, since even a simple value like var x: u20 = 0xABCDE; may be represented in memory in many different ways, depending on the target and backend:

  • DE BC XA (current behavior of little-endian targets with the llvm backend)
  • EX CD AB
  • DE BC XA XX (current behavior of little-endian targets with the c backend, and on the x86_64 backend)
  • EX CD AB XX
  • XX DE BC XA
  • XX EX CD AB
  • AB CD EX
  • XA BC DE (current behavior of big-endian targets with the llvm backend)
  • AB CD EX XX
  • XA BC DE XX
  • XX AB CD EX
  • XX XA BC DE (current behavior of big-endian targets with the c backend)

However I think an intrinsic like @bitCast should be defined in a way that does not invoke this complexity, whereas it seems perfectly reasonable and necessary to define pointer casting in terms of the target- and backend-specific memory layout. Additionally, the fact that pointer casting is already legal, makes adding an intrinsic defined precisely in terms of it not add any additional functionality to the language. It could be argued that two things having the same semantics also violates Only one obvious way to do things.

This means that @bitCast actually needs a specific definition (such as in a language spec :roll_eyes:), but since it currently doesn't, it has different semantics for different types and is implemented inconsistently across the compiler. By defining @bitCast in a target and backend agnostic way, this operation becomes "safer" in some sense than @ptrCast since you don't have to worry about it behaving differently on a big endian target, for example. I believe this leads to a clear delineation of use cases that makes @bitCast worth having in the language as a separate concept.

The main motivation for resurrecting this proposal, and an argument that was not explored in the original proposal is the effect of @bitCast on vectors. With vectors rightly not having well-defined memory layout (given the wide variety of vector semantics across architectures) we lose the ability to convert between differently packed vectors, or even just between @Vector(8, bool), @Vector(8, u1), @Vector(8, i1), u8, and i8. While @bitCast could be defined elementwise on vectors and it's possible to convert from bool with @select and to bool with comparisons, that doesn't solve the use case of converting a vector to an integer.

I am going to start off with the reasonable assumptions that @bitSizeOf should work for all types that are allowed for @bitCast, and that @as(To, @bitCast(@as(From, from))) requires that @bitSizeOf(To) == @bitSizeOf(From) and performs a copy of that number of bits. The open question is what types should be allowed and how the order of these bits is defined for each of those types. I propose starting off with a limited, fairly uncontroversial set and to leave more complicated cases for a future proposal, in order to unblock progress on the backends more quickly.

The proposed types to be allowed initially, along with the value that @bitSizeOf would return:

  • packable types (allowed as the type of a packed struct field)
    • void: 0 bits
    • bool: 1 bit
    • uN: N bits
    • iN: N bits
    • fN: N bits
    • *T, ?*T, [*]T, ?[*]T, [*c]T, usize, isize, for runtime-allowed T: @bitSizeOf(usize) bits (note that this is not allowed as the type of a @bitCast in favor of @ptrFromInt, @intFromPtr, and @ptrCast)
    • enum (T): @bitSizeOf(T) bits (note that this is not allowed as the type of a @bitCast in favor of @enumFromInt and @intFromEnum)
    • packed struct (T): @bitSizeOf(T) bits
    • packed union: comptime size: { var size = 0; for (@typeInfo(U).Union.fields) |field| size = @max(size, @bitSizeOf(field.type)); break :size size; } (note that https://github.com/ziglang/zig/issues/19754#issuecomment-2073523400 will vastly simplify this to just @bitSizeOf(T) as in the previous case)
  • [N]T, for runtime-allowed T: N * @bitSizeOf(T) bits
  • @Vector(N, T), for runtime-allowed T: N * @bitSizeOf(T) bits (note that this is currently a packable type, but I don't think it should be if given that arrays aren't allowed)

If you number bits from lsb to msb starting at the first field of a packed struct, or the first element of an array or vector, for two types, then @bitCast would copy numbered bits of one type to the same numbered bit of another type. This matches the way packed struct orders bits and is meant to be consistent with that.

Types to consider for future proposals:

  • Error sets with the same semantics as the "error int type".
  • Error unions with a defined order between the error and the payload.
  • Non-pointer optionals with a defined position and meaning of the extra bit.
  • All structs with valid field types, bits are accumulated in field declaration order, not related to memory layout and ignoring padding.
  • Unions, but it is an open question how to define this.

Related:

  • #8102
  • #10547
  • #16677
  • #17645
  • #18652
  • #18936
  • #19660
  • #19754

jacobly0 avatar Apr 24 '24 01:04 jacobly0