Proposal: pass default value to `std.mem.Allocator.create` etc
This proposal competes with #15764.
In Zig, you typically have to explicitly write undefined to get an undefined value. There are exceptions (e.g. @intFromPtr(&{}) can give you @as(usize, undefined) without ever explicitly spelling that out), but as a general rule, it's considered bad form for a language feature or API to give you undefined values without you explicitly acknowledging that.
There is one obvious part of the standard library where this rule is not followed: allocators! Allocators always return uninitialized memory. This is familiar to those of us coming from C, but when you stop and think about it, is actually pretty difficult to justify. If I want a local variable to be uninitialized, I write var x: u32 = undefined, because this is far more clear; why do we not apply the same logic to std.mem.Allocator.create?
This would also make the most common usage pattern of create -- allocating a value, then immediately following it up with an assignment (ptr.* = ...) -- more convenient.
This proposal suggests to change the implementation of std.mem.Allocator.create to the following:
pub inline fn create(self: Allocator, comptime T: type, init_val: T) Error!*T {
if (@sizeOf(T) == 0) return comptime &init_val;
const ptr: *T = @ptrCast(try self.allocBytesWithAlignment(@alignOf(T), @sizeOf(T), @returnAddress()));
ptr.* = init_val;
return ptr;
}
The inline annotation is used here to allow propagation of a comptime-known init_val, which is particularly important in the case where it is undefined. An optimizer may not have good reason to inline this function, which would lead to an actual store of an undefined value to the allocated pointer, which could add up to a lot of wasted CPU cycles. In addition, the inline annotation probably helps to avoid binary bloat, at least in debug builds.
Bonus Proposal
One can consider whether we should give the same treatment to alloc. A direct analogy would effectively transform it into dupe; however, this is unhelpful, as you typically don't have a slice to hand while you're trying to construct another slice! So, the only option seems to be to take an init_val: T and @memset the entire slice to this value. If we want to consider this, I propose it should apply to the following methods:
allocallocWithOptionsallocWithOptionsRetAddrallocSentinelalignedAllocallocAdvancedWithRetAddr
Going even deeper, we can ask whether resize/realloc/reallocAdvanced should take a parameter to init added buffer capacity with. However, my opinion is that this would be taking the idea to an unreasonable extreme.
This used to be how it worked already, and then I changed it to status quo to avoid the copy. This is important when considering pinning, for example using a heap allocation to store an async frame.
if im not mistaken, wouldnt the new/current semantics of inline fn prevent it from being a copy?
Why not a .new instead of .create? And have .new be the one with the default value, and .create be uninitialized (status quo)?
const allocator = std.heap.page_allocator;
const my = try allocator.new(MyStruct, .{
.value = 123
});
// Keep this the same:
var my = try std.heap.page_allocator.create(MyStruct);
for every point i have observed the codegen for the proposed pub inline fn create, it has always appeared equivalent of const ptr = try create(); ptr.* = init;. i don't know enough about the compiler to say that this is a guarantee. i wish it was, because i've been using this pattern for a while in Bun and other projects and it is amazing.
Why not a .new instead of .create? And have .new be the one with the default value, and .create be uninitialized (status quo)?
i think create returning undefined memory by default is a dangerous property. it's one of the most common ways to get undefined memory without the undefined keyword. we can stop this pattern.
in the rare case that you do want a single item pointer to start as undefined, you can just pass undefined as the initializer. An example from Bun (modified to fit mlugg's naming convention)
const signal = try alloc.create(libuv.uv_signal_t, undefined);
errdefer alloc.destroy(signal);
var rc = libuv.uv_signal_init(global.bunVM().uvLoop(), signal);
// ...
// before
const signal = try alloc.create(libuv.uv_signal_t);
errdefer alloc.destroy(signal);
var rc = libuv.uv_signal_init(global.bunVM().uvLoop(), signal);
// ...
now the intention sticks out, almost a bit too much. there is clearly undefined memory here. i think when i originally committed this it caught a lot of attention in review because of how abrupt it was; but that's good! a reviewer/reader sees this and instantly is aware there is undefined memory here, and pays way more attention to the code.
and then in every other case, this creation becomes more concise because the allocation being two statements (alloc + init), it is now an expression. it's an improvement for everyone as long as the compiler doesn't use a copy
edit: just realized in the linked issue i also wrote nearly the exact same comment
This proposal competes with #15764.
As the author of that proposal, I'd prefer this solution. It goes further than I did to address the same issue I was trying to cover.
was just thinking of #4298. when calling create(T, undefined), the undefined memory scramble happens in create, and therefore couldnt be moved into the allocator implementations. if alloc and friends also get this treatment, i'm not sure how that proposal would work.
The first thread I started on Ziggit was about a related idea, an allocWithScalar. That just happened to be the case I was interested in, but generalize it to everything an allocator can allocate and it's basically this proposal.
I'm linking it here since the discussion covers some good points, as a sort of summary:
- A goal with something like this should be to enable "fast zero" allocation. Many systems will have freshly-paged zeros available, and for certain kinds of number crunching this can have a significant positive impact on performance. Big matrices, mostly.
- Zeroed memory is not worth special-casing in the interface, and the goal of getting access to handy zeroed-out memory shouldn't be confused with a suggestion that Zig zero everything out when it allocates. But as a practical matter, zero is the mostly likely large block of some value for a program to want, and it's one which operating systems frequently make available, as a sort of feedback loop with calloc. The discussion points out some ways in which an allocator could make other default values available, but not using the current interface, leading to the next point:
- Making fast-zero possible would involve changes to how Allocators work right now.
- Those changes might be worth making anyway, but aren't trivial at all. One option is to make Allocator into a special sort of type which Zig knows about, others involve reifying vtables in some fashion. There have been several proposals for the latter, none of them have been accepted, since none of them has come up with a design which really fits the language.
I don't have a firm opinion on whether or not allocation-with-default would be worth pursuing without changes to the allocation interface. Maybe. It would definitely create a user-facing interface for a later optimization allowing for fast-zero allocation, which is available in existing systems because it enables good performance for real code. The whole "zero as security feature" and "make the zero value meaningful" movement comes later in the story, and shouldn't be confused with the original purpose.
I'm fully in favor of this proposal, since implicit undefined memory is a huge footgun.
Of note with regard to memory zeroing: LLVM seemingly identifies and optimizes memsets to explicit 0 values by changing the memset call to a bzero call instead. In Ghostty's code a while ago, on a hunch I swapped a @memset(cells, .{}) for @memset(@as([]u64, @ptrCast(cells)), 0); (the default value of the struct is fully 0, by design) and saw that DYLD-STUB$$bzero showed up in profiler traces and measured a statistically significant performance improvement. Mnemnion's proposed allocWithScalar matches the very common pattern of alloc + @memset, and if allocators could specifically optimize for known zeroes that'd be a big win.
was just thinking of #4298. when calling
create(T, undefined), theundefinedmemory scramble happens in create, and therefore couldnt be moved into the allocator implementations. if alloc and friends also get this treatment, i'm not sure how that proposal would work.
I think that honestly the two proposals work hand in hand. There's a different meaning to setting the memory to undefined in the allocator vs setting it to undefined in the interface.
- In the allocator, the memory is (or, would be) set to undefined because it's assumed that the allocator offers no guarantees about the memory it allocates;
- In the interface, the memory is set to undefined because even if the allocator does offer guarantees about the memory it allocates, that memory doesn't necessarily have a consistent meaning for the type being created, which could lead to weird implicit behaviors that could become footguns. (e.g. a user creates a struct, it has zeroed memory, and the default value of their struct happens to be zero, so they make the assumption that when they create it uses the default value-- then they change the default value to a non-zero value, and get unexpected behavior)
I think this works perfectly on top of #4298 -- as long as the interface can determine what guarantees (if any) the underlying allocator provides for the memory is allocates, and optimize based on that. For example, if the allocator always returns zeroed memory and the default value bit casts to zero then the set can be skipped (perhaps assert to the compiler that the bytes are 0 if the underlying allocator guarantees it, so that the compiler can optimize out the set?). And potentially more complex optimizations are available, for example if I have a large struct that's mostly 0 except 1 non-zero field, only that field need be set.
I largely agree with the design principle that implicit zero initialization shouldn't be meaningful, but it should be made possible for the compiler to understand guarantees that allocators provide and optimize the codegen based on them.
Basically, the most simple way to achieve what I'm imagining is to have allocators contain info about their default value, with a ?u8 (null indicating that allocated memory will be undefined) would do, when wrapping an allocator with another allocator, it can appropriately determine what guarantee it provides based on the underlying allocator's guarantee. The interface functions could then look something along the lines of:
pub inline fn create(self: Allocator, comptime T: type, init_val: T) Error!*T {
if (@sizeOf(T) == 0) return comptime &init_val;
const ptr: *T = @ptrCast(try self.allocBytesWithAlignment(@alignOf(T), @sizeOf(T), @returnAddress()));
if (self.guarantee) |g| {
const U = std.meta.Int(.unsigned, 8 * @sizeOf(T));
const v = [_]u8{g} ** @sizeOf(T);
assert(@as(U, @bitCast(ptr.*)) == @as(U, @bitCast(v)));
}
// Ideally, the above assert is sufficient for the compiler to optimize this out when applicable.
ptr.* = init_val;
return ptr;
}
(there is very likely a more elegant way to do that assert -- idk if what I wrote is even valid, it's just to communicate the idea though)
(this does still result in a copy for an undefined init_val though)
Something to consider: OSes often give you zeroed memory for free. For example, Linux zeroes all pages returned by mmap. Windows' VirtualAlloc returns zeroed memory.