zig icon indicating copy to clipboard operation
zig copied to clipboard

Coercion to less aligned is not safe

Open michal-dobrogost opened this issue 2 months ago • 22 comments

Zig Version

0.15.2

Steps to Reproduce and Observed Behavior

Coercing a more aligned type to a less aligned type is not safe as the memory is no longer able to be freed correctly. This coercion behaviour disables std.mem.alignedAlloc from enforcing that the returned type has the correct alignment (or flagging the risky behaviour via @alignCast).

This has hit multiple users (+1 for myself now) and is only detected by std.testing.allocator. Other allocators continue silently. https://discordapp.com/channels/605571803288698900/1375373321691992126 https://discordapp.com/channels/605571803288698900/1421631270504632441

Repro:

test "alignedAlloc" {
    const gpa = std.testing.allocator;
    const alignment: std.mem.Alignment = @enumFromInt(4);
    const data: []u8 = try gpa.alignedAlloc(u8, alignment, 12);
    defer gpa.free(data); // [gpa] (err): Allocation alignment 16 does not match free alignment 1.
}

Expected Behavior

Any change in alignment requires @alignCast.

michal-dobrogost avatar Nov 12 '25 16:11 michal-dobrogost

This is the intended behavior https://github.com/ziglang/zig/issues/17668#issuecomment-1776393630

Vexu avatar Nov 12 '25 17:11 Vexu

Thanks for digging that up. It's a shame to accept the status quo here as the current behaviour as is so unexpected, silent and dangerous.

I can put up a PR to at least document the current behaviour.

michal-dobrogost avatar Nov 12 '25 18:11 michal-dobrogost

Re-framing from the issue title here: fundamentally, going from a stricter alignment to a looser (smaller) alignment is safe. The problem is that free() expects to be given the same type that alignedAlloc() returned, and we've instead coerced it to be something else before we hand it back to free(). This alternative version works fine:

test "alignedAlloc" {
    const gpa = std.testing.allocator;
    const alignment: std.mem.Alignment = @enumFromInt(4);
    const raw_data = try gpa.alignedAlloc(u8, alignment, 12);
    defer gpa.free(raw_data); // no error, we gave back what we were originally given
    const coerced_data: []u8 = raw_data; // safe coercion to smaller alignment
    _ = coerced_data; // can be safely used
}

blblack avatar Nov 12 '25 18:11 blblack

We can't use your example to show that alignment coercian is safe because there is no alignment coercian in your code (but there is in the original example).

michal-dobrogost avatar Nov 12 '25 18:11 michal-dobrogost

FWIW here is the doc comment I have ready for alignedAlloc but it doesn't address the underlaying issue just helps prevent some people from blasting their feet off.

+/// Allocates with an alignment which is incorporated into the returned type.
+/// Call `free` when done.
+///
+/// Prefer inferred types like `const data = alignedAlloc(...)` as it will use
+/// the correctly aligned type.
+///
+/// Avoid explicit types like `const data: u8[] = alignedAlloc(...)` as they
+/// can change the alignment type information needed when calling `free`.
+///
+/// If alignment is not comptime known use `rawAlloc` and `rawFree`.

michal-dobrogost avatar Nov 12 '25 18:11 michal-dobrogost

We can't use your example to show that alignment coercian is safe because there is no alignment coercian in your code (but there is in the original example).

The same coercion exists in both examples. In my version, I'm just not using the coerced copy of the pointer when I call free().

blblack avatar Nov 12 '25 18:11 blblack

The earlier example, but with types made explicit:

test "alignedAlloc" {
    const gpa = std.testing.allocator;
    const alignment: std.mem.Alignment = @enumFromInt(4);
    const raw_data: []align(16) u8 = try gpa.alignedAlloc(u8, alignment, 12);
    defer gpa.free(raw_data); // no error, freed what we were actually given (align(16))
    const data: []align(1) u8 = raw_data; // coerces from align(16) to align(1) safely
    _ = data;
}

blblack avatar Nov 12 '25 18:11 blblack

Can you coerce and then pass into free safely?

michal-dobrogost avatar Nov 12 '25 18:11 michal-dobrogost

Can you coerce and then pass into free safely?

No, not without coercing it back to the original type. That was the point I'm trying to re-frame here. The fundamental issue here is not really about coercion or @alignCast(), it's about the Allocator API requiring that free() be given the same type that an allocation function originally returned. The effective contract with the Allocator API is not just a pointer, it's a specific type of pointer.

blblack avatar Nov 12 '25 18:11 blblack

Exactly, free depends on the alignment information encoded in the type which combined with alignment coercion rules results in unsafe code.

The structure of your counter example can be used to make the statement "accessing undefined is defined":

test "definedUndefined" {
    var x: u64 = 42;
    std.debug.print("{}\n", .{x});
    x = undefined;
}

michal-dobrogost avatar Nov 12 '25 18:11 michal-dobrogost

With the distinction that the unsafe operation for undefined values is accessing them, and for alignment coerced values it is freeing them.

michal-dobrogost avatar Nov 12 '25 18:11 michal-dobrogost

Not all values or types even interact with allocation at all. At a language level, IMHO, alignment casting and coercion are designed correctly. We don't want to have to explicitly downcast from greater to lesser alignment; that doesn't make fundamental sense in most cases. For example:

test "alignment" {
    var my_data: [16]u8 align(16) = @splat(0);
    my_data[12] = 12;
    const my_slice: []align(16) u8 = my_data[0..16];
    const where = std.mem.findScalar(u8, my_slice, 12);
    try std.testing.expect(where == 12);
}

findScalar() takes a normal []u8 argument here and doesn't need higher alignment to do its job. Just because I've aligned my data for some other reason, doesn't mean I should have to explicitly @alignCast() this every time I call some stdlib function that's fine with any alignment of []u8. In a very fundamental sense, coercing to lesser alignment is always ok. For example, align(16) satisfies a requirement of align(8) (but not the other way around!).

I'm asserting that your real issue here is not with the language, but with the Allocator interface, and specifically the fact that while free() takes an anytype, it actually requires a very specific type, which is the exact type originally handed out by its allocation function earlier when it allocated the pointer in question. The type info is carrying important infomation that the allocator implementation needs to know. That is a bit of a footgun in some senses, but it's an Allocator footgun, not a language footgun. I think if you tried to design a better generic Allocator interface that makes this a compile error, though, you'll find other unacceptable tradeoffs.

blblack avatar Nov 12 '25 19:11 blblack

Agreed - some code doesn't interact with allocation at all. I think that sidesteps the issue. A lot of (most?) code does and we should consider it.

IMO memory management is so fundemental that it is closely tied to the langauge itself (eg. zig defer, c++ destructors, java garbage collection, rust linear types etc). I have a hard time thinking about a langauge fully in isolation of memory management even if it's implmented in the std lib.

That aside, I think that the language is complicit in this issue. std.mem.Allocator couldn't do what it does without the interaction between alignment coercion rules and type reflection.

Alignment is unique among the coercion rules previously discussed in https://github.com/ziglang/zig/issues/17668#issuecomment-1776393630. It has several unique characteristics (see code below):

(1) Alignment coercion is the only one where information is changed with abilities unaffected. The other coercion cases result in lost information or lost abilities.

(2) All other cases only exist at the language level but alignment also matters at the machine level. This means that some code will care about making use of alignment information. For example, SIMD cares about memory alignment and so an implementation can choose to use SIMD or not based on the alignment type info. This can result in a silent performance degradation when the language could have flagged the alignment typeinfo change to the reader instead.

test "coercion" {
    // Lost ability to modify the value.
    var v: u8 = 42;
    const p0:*u8 = &v;
    const p1:*const u8 = p0;
    std.debug.print("p0:{}\np1:{}\n\n", .{@typeInfo(@TypeOf(p0)), @typeInfo(@TypeOf(p1))});
    // .is_const = false
    //           = true

    // Lost information about error set
    const ErrorSet = error{ MyNewError };
    const e0: ErrorSet  = ErrorSet.MyNewError;
    const e1:anyerror = e0;
    std.debug.print("e0:{}\ne1:{}\n\n", .{@typeInfo(@TypeOf(e0)), @typeInfo(@TypeOf(e1))});
    // .error_set = { .{ .name = { ... } } }
    //            = null

    // Lost information about pointer type
    const o0:*anyopaque = p0;
    std.debug.print("p0:{}\no0:{}\n\n", .{@typeInfo(@TypeOf(p0)), @typeInfo(@TypeOf(o0))});
    // .child = u8
    //        = anyopaque

    // Changed existing alignment information.
    const a0: [1]u8 align(16) = @splat(v);
    const a1: *align(16) const u8 = &a0[0];
    const a2: *align(1) const u8 = a1;
    std.debug.print("a0:{}\na1:{}\n\n", .{@typeInfo(@TypeOf(a1)), @typeInfo(@TypeOf(a2))});
    // .alignment = 16
    //            = 1
}

Fundementally I have a hard time understanding why being explict about decreased alignment is a bad thing. I understand that for most access it doesn't matter but there are cases when it does. The burden seems minimal as you only have to pay the explicit tax if the code cares about alignment.

Funnily enough, reading through the zen of zig (+ claude's added blurb), I surprised myself how many items seem to apply. Perhaps I'm biased.

  1. Communicate intent precisely - Avoid ambiguity and implicit behavior that forces readers to guess.
  2. Edge cases matter - Don't ignore unusual or boundary conditions.
  3. Favor reading code over writing code - Verbosity is acceptable if it aids understanding.
  4. Compile errors are better than runtime crashes - Catch problems at compile-time whenever possible.
  5. Reduce the amount one must remember - Minimize special cases, exceptions to rules, and gotchas.
  6. Resource allocation may fail; resource deallocation must succeed

michal-dobrogost avatar Nov 12 '25 21:11 michal-dobrogost

Other allocators continue silently.

I just want to point out that std.heap.DebugAllocator(.{ .safety = true }), which is the default in Debug/ReleaseSafe, logs error(gpa): Allocation alignment n does not match free alignment m and stack traces (but then continues like nothing happened which surprised me a bit, I would have expected a panic). The Zig modus operandi for protecting yourself from memory unsafety and leaks is by writing tests that use std.testing.allocator and by using a validating allocator like DebugAllocator in debug builds.

That said, I agree that it's a pitfall. However, consider if the signature of free() was changed to something like this:

fn freeWithOptions(
    allocator: Allocator,
    T: type,
    comptime alignment: ?Alignment,
    comptime sentinel: ?T,
    slice: FreeWithOptionsPayload(T, alignment, sentinel),
) void {
    allocator.free(slice);
}

fn FreeWithOptionsPayload(comptime Elem: type, comptime alignment: ?Alignment, comptime sentinel: ?Elem) type {
    if (sentinel) |s| {
        return [:s]align(if (alignment) |a| a.toByteUnits() else @alignOf(Elem)) const Elem;
    } else {
        return []align(if (alignment) |a| a.toByteUnits() else @alignOf(Elem)) const Elem;
    }
}

You are now protected against mistakes like these:

const data: []u8 = try gpa.allocWithOptions(u8, 10, .fromByteUnits(4), 0);
defer gpa.freeWithOptions(u8, .fromByteUnits(4), 0, data);
repro.zig:?:?: error: expected type '[:0]align(4) const u8', found '[]u8'

But in truth, we've kinda only really traded it for a different footgun:

// initial code
const data = try gpa.allocWithOptions(u8, 10, null, null);
defer freeWithOptions(gpa, u8, null, null, data);

// some time later we realize the data should be aligned and 0-terminated
const data = try gpa.allocWithOptions(u8, 10, .fromByteUnits(4), 0);
defer freeWithOptions(gpa, u8, null, null, data); // oops! forgot to change this line!

Now we're back to square one.

I don't know if it's fully possible to design an allocator API that doesn't have this problem in one form or another without removing sentinel/alignment-dropping coercions from the language.

castholm avatar Nov 12 '25 22:11 castholm

I don't know if it's fully possible to design an allocator API that doesn't have this problem in one form or another without removing sentinel/alignment-dropping coercions from the language.

Well, just to illuminate the dark things hiding in the corners: it's also possible with some kind of fat return value that has to be unwrapped (e.g. imagine if alloc() and friends returned some kind of AllocatedObject(T) which carried the additional type info with it and had to be the argument to free() and friends). Seems like it would be terribly un-ergonomic and bloaty for the common case, though. It's also possible if you mandated that all the Allocator implementations track alignment and size for themselves (some kinds do this anyways, but for many other cases this would just be a different kind of senseless bloat inside of the allocator, instead of outside of it). I think these things are terrible ideas, of course.

But for me personally, I like things where they stand now. The simple rule to follow, if you're not wanting to get into the weeds and really understand things, is: don't coerce the return type of an Allocator function into something else and then pass that to other Allocator functions. const foo = alloc.allocAligned(...); defer alloc.free(foo); // ... then coerce for other uses if you need. The slightly-deeper rule is "make sure the type of pointers passed to allocators is the same type they originally returned to you on the initial allocation". If you step outside of the really basic Allocator usage rules, you had best understand what's going on, basically. Every language is going to have some footguns that need calling out. The fact that DebugAllocator and the testing.allocator both catch this case, plus adding some copious, bold warnings to the Allocator interface documentation should be enough, IMHO.

If I'd argue for any change to make this better, I'd argue for some kind of special language feature that should be rarely-ever-used, that can be used to flag the return type of special functions like allocAligned() that tells the compiler to disallow coercion of the return value at all (in that one statement) because of special situations like this. Example (skipping over irrelevant details):

pub fn allocAligned(...) ... !T nocoerce {
    // ...
}

const alignment: std.mem.Alignment = @enumFromInt(4);
const data_err: []u8 = try gpa.allocAligned(u8, alignment, 12); // error, cannot directly coerce this result to a different type!
const data_ok = try gpa.allocAligned(u8, alignment, 12);
defer gpa.free(data_ok);
const data_newalign: []u8 = data_ok; // this is allowed, after the fact.

But even something along those lines seems kinda sketchy to me, and it's still possible to hurt yourself if you try harder :)

blblack avatar Nov 12 '25 23:11 blblack

Another take on nocoerce would be to have it be part of the type instead of a function annotation. An example of the returned type: const data: [](align 4) u8 nocoerce = alignedAlloc(u8, @enumFromInt(4), 12). You could still: use @alignCast to change typeinfo alignment, introduce a @coerceCast and probably need to introduce something for sentinal casting? It does feel like we're getting into C++ like complexity spiral here though.

michal-dobrogost avatar Nov 13 '25 12:11 michal-dobrogost

Are there solid examples other than Allocator where this is a noteworthy footgun?

Also, stepping back and questioning assumptions a bit: are there real allocator implementations (or expected upcoming ones) which both don't track their individual allocations' state (in some way that would easily work around these kinds of issues) and for which passing a less-aligned pointer to free() would cause a real error (or IB or whatever)? When I look at the current suite of non-debug/test heap allocator implementations, none of them seem to care, at least for me on Linux. I can run the original code block without issue with page_allocator, smp_allocator, FixedBufferAllocator, and ArenaAllocator. Is it possible that the real issue here is just testing.allocator and DebugAllocator being overzealous about alignment constraints that shouldn't really matter?

blblack avatar Nov 13 '25 13:11 blblack

Apparently DebugAllocator's implementation actually relies on the alignment passed to free(). But still, in theory, that could be fixed at some cost to DebugAllocator's already not-awesome efficiency, it's an implementation detail.

blblack avatar Nov 13 '25 13:11 blblack

Ah no, SmpAllocator does storage classes based on Alignment. I just got lucky.

blblack avatar Nov 13 '25 13:11 blblack

I built zig with no alignment coercion (one line changed in Sema.zig). The build flags 68 alignment coercions (compared to 3099 instances of intCast in the src folder). This seems like a small price to pay IMO. I'm not sure how the stages of the compiler build work, so please correct me if needed.

Full list of implicit alignment coercions. lib/compiler/aro/aro/Parser.zig:9484:61
lib/std/Io/Threaded.zig:125:38
lib/std/Io/Threaded.zig:4397:21
lib/std/Io/Writer.zig:1128:51
lib/std/Progress.zig:142:31
lib/std/Progress.zig:1438:27
lib/std/Progress.zig:252:29
lib/std/Progress.zig:309:33
lib/std/Progress.zig:374:27
lib/std/Progress.zig:915:57
lib/std/crypto/aegis.zig:298:30
lib/std/crypto/aegis.zig:333:30
lib/std/crypto/aegis.zig:593:30
lib/std/crypto/aegis.zig:628:30
lib/std/crypto/blake3.zig:837:33
lib/std/crypto/blake3.zig:882:25
lib/std/debug/SelfInfo/MachO.zig:589:30
lib/std/debug/SelfInfo/MachO.zig:896:66
lib/std/heap/PageAllocator.zig:113:12
lib/std/heap/PageAllocator.zig:182:22
lib/std/macho.zig:661:26
lib/std/macho.zig:784:26
lib/std/macho.zig:788:26
lib/std/math/big/int.zig:921:53
lib/std/mem.zig:1055:51
lib/std/mem.zig:2296:68
lib/std/mem.zig:727:37
lib/std/mem/Allocator.zig:370:35
lib/std/mem/Allocator.zig:419:23
lib/std/mem/Allocator.zig:445:31
lib/std/multi_array_list.zig:225:34
lib/std/unicode.zig:483:25
lib/std/zig/AstGen.zig:12301:71
lib/std/zig/llvm/Builder.zig:10705:47
lib/std/zig/llvm/Builder.zig:10860:37
lib/std/zig/llvm/Builder.zig:11036:74
lib/std/zig/llvm/Builder.zig:11088:47
lib/std/zig/llvm/Builder.zig:11690:47
lib/std/zig/llvm/Builder.zig:11871:47
lib/std/zig/llvm/Builder.zig:12922:47
lib/std/zig/llvm/Builder.zig:12993:47
lib/std/zig/llvm/Builder.zig:13037:47
lib/std/zig/llvm/Builder.zig:13811:76
src/Air.zig:1811:51
src/Air/print.zig:732:54
src/InternPool.zig:11258:52
src/InternPool.zig:1542:12
src/InternPool.zig:2830:72
src/InternPool.zig:6435:53
src/Sema.zig:19837:43
src/Sema/LowerZon.zig:147:63
src/codegen/aarch64/Select.zig:2722:57
src/codegen/c.zig:5480:42
src/codegen/llvm.zig:3667:68
src/codegen/llvm.zig:7323:56
src/codegen/riscv64/CodeGen.zig:6146:50
src/codegen/sparc64/CodeGen.zig:905:19
src/codegen/spirv/CodeGen.zig:5917:29
src/codegen/spirv/Module.zig:151:47
src/codegen/spirv/Module.zig:170:47
src/codegen/x86_64/CodeGen.zig:177212:54
src/codegen/x86_64/Mir.zig:1709:72
src/link/Coff.zig:1843:47
src/link/Elf/SharedObject.zig:121:42
src/link/Elf/eh_frame.zig:109:52
src/link/MachO.zig:2995:63
src/link/SpirV/BinaryModule.zig:235:71
src/link/SpirV/lower_invocation_globals.zig:311:51

Interestingly std.crypto.aegis hits a misaligned SIMD case:

  • state.absorb() loads the input slice into vectors
  • encrypt() makes an attempt to keep data 16 byte aligned (see src and dst) but alignment of the input m is not enforced nor is there a loop prologue to process the unaligned data so the rest can be processed in an aligned way.

My micro benchmarks didn't show any difference for aligned and misaligned access in this code though and it seems alignment doesn't affect SIMD on modern processors (link).

michal-dobrogost avatar Nov 13 '25 21:11 michal-dobrogost

The problem lies in the fact that this is essentially an ownership issue, somewhat beyond the capabilities of zig. When a coercion occurs, the acquired pointer is considered a reference and loses ownership. It's not that coercions shouldn't occur, but it's generally believed that a transferred pointer shouldn't retain ownership.

Preventing coercions attempts to explicitly notify the user that the pointer has been transferred, but this is still an indirect measure for security.

npc1054657282 avatar Nov 17 '25 17:11 npc1054657282

That's an illuminating perspective. Even if there was ownership tracking, allowing this coercion during ownership transfer would be a mistake, so I'm not sure that ownership is the core issue. In the context of zig where assignments may or may not be ownership transfers the conservative behaviour would be to assume ownership transfer.

At the core I believe there is an inconsistency on what the definition of alignment is between the coercion rules and std.mem.Allocator: (1) coercion rules believe alignment indicates the minimum alignment of the data in memory. (2) std.mem.Allocator believes it means the exact alignment requested during allocation.

michal-dobrogost avatar Nov 24 '25 15:11 michal-dobrogost