wgpu
wgpu copied to clipboard
Array Stride Mismatch in vec3-padded Struct
Description Running a compute shader with an input of an array of struct of 20 bytes will only see the first 20/32 entries.
Repro steps
- Clone https://github.com/froody/wgpu/tree/compute-bug
-
cd examples
-
cargo run hello_compute
Expected vs observed behavior Observed:
result: 189
result with vertex 20: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 0, 0, 0, 0, 0, 0, 0]
result with vertex 32: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
Expected:
result: 189
result with vertex 20: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
result with vertex 32: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
Extra materials Screenshots to help explain your problem. Validation logs can be attached in case there are warnings and errors. Zip-compressed API traces and GPU captures can also land here.
Platform MacOS 14.3 (23D56), MacBookPro18,2
you have too much padding? you only need to pad to 16 bytes, not 20 bytes
struct Vertex20 {
values: vec3<u32>,
padding: f32,
padding2: u32,
};
@cybersoulK this is just a proof of the bug, In my actual use case I need 20 bytes of data per vertex
then just rename them to fields: [
struct Vertex20 {
values: [f32; 3],
field1: f32,
field2: u32,
_padding: [f32; 3]
};
you need to add actual padding to be alligned to 16
the previous is the cpu side. in wgsl shader you just need:
struct Vertex20 {
values: vec3<u32>,
field1: f32,
field2: u32,
};
(they automatically pad it)
to be clear, only Uniform / Storage buffer require alignment to 16 bytes
there is: https://github.com/teoxoy/encase that replaces bytemuck and creates buffers that matches exactly to WGSL, without needing manual padding. This works by adding a derive macro behind your structs.
@cybersoulK can you prove there's automatic padding injected for (vec3<u32>,f32, u32)
? In the example I provided it's very clear that the shader is striding 20 bytes per index into v20
. If the Vertex20
struct in the shader was being padded then I would expect garbage values at non-aligned locations, e.g:
result with vertex 20: [0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 16, 0, 0, 0, 0, 0, 0, 0, 0, 0]
result with vertex 32: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
It seems like the decision for how much buffer to map is being made earlier on and isn't actually related to how the GPU is interpreting the buffer (at least on Metal).
https://gist.github.com/froody/f1d4ec656a2110191ea1618187806ba1
The issue here is as follows:
struct Inner {
vec: vec3<u32>,
scalar1: u32,
scalar2: u32
}
This struct is supposed to be size 32, alignment 16. However, when accessed through an array in metal, the stride shows itself to be 24. I suspect packed_uint3 doesn't have the alignment requirements?
struct Inner {
metal::packed_uint3 vec;
float scalar1;
uint scalar2;
};
Unmark the test as failing on metal in #5264 to see the behavior.
Yeah, https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf#page=31&zoom=auto,-462,785
Probably just need to insert an alignas(16) when we're using packed_uint3