Atomic values should allow vectors
Atomic values only allow for a pointer, a bool, a float, an integer or an enum. It would be really useful for a critical part of my project to allow @Vector types to be stored in atomics.
I would be curious to know why you need atomic operations on a simd type: can't you operate on each component atomically instead ? (I am genuinely curious, not being snarky) It looks like intel supports this feature and also ARM but not all operations are supported.
I can, but it would be a lot more convenient to operate atomically on the entire vector at once, instead of working on each element like an array.
I am trying to understand a use case for this. Could you explain it a bit more?
While I can operate on the vector on a per-field basis, it would be a lot more concice (and faster, if the SIMD instructions are preserved) to operate on the entire vector with another vector like so:
const std = @import("std");
const expect = std.testing.expect;
const Value = std.atomic.Value;
const monotonic = std.builtin.AtomicOrder.monotonic;
const vec3 = @Vector(3, u32);
test "atomic vector example" {
var vector = Value(vec3).init(vec3{ 1, 2, 3 });
_ = vector.fetchAdd(vec3{ 4, 5, 6 }, monotonic);
try expect(vector == vec3{ 5, 7, 9 });
}
You've described what you're proposing, but when would you need this? What is a real-world usecase?
I'm writing a software raytracer. I have multiple threads that, for each pixel, generate a set amount of samples, then combine those samples into the final color for the pixel, which is then reset to zero for the next pixel. That's what I need the atomic vectors for - combining the samples that were generated into a single vector for the pixel itself, without a possibly dangerous race condition. While I can do it with a mutex, it would be really inefficient, and I would rather eliminate them entirely.
This proposal does not make sense, since, AFAIK, no mainstream CPU supports atomic vector operations other than plain loads and stores—this feature cannot be added to Zig since it requires CPU instructions that do not exist.
not only does this operation not exist on most CPUs, it also wouldn't really be that much faster than a mutex, due to having the same amount of contention. you should probably be using a parallel reduction instead.
Closing since this indeed doesn't seem implementable other than by lowering to compiler-rt functions which use a mutex, which would render the whole exercise a bit silly.