wgsl_to_wgpu WGSL structure alignment

https://github.com/gpuweb/gpuweb/issues/1393 https://gpuweb.github.io/gpuweb/wgsl/#memory-layouts It seems like uniform buffers still require 16-byte aligned array members like in std140.

Mar 11 '22 04:03 ScanMountGoat

This project seems to be reasonably well maintained and handles reading and writing to buffers. https://github.com/teoxoy/encase

Mar 11 '22 04:03 ScanMountGoat

Using bytemuck won't always produce the expected results for uniform buffers and storage buffers since the corresponding WGSL types can have different sizes and alignments than the corresponding type in Rust, glam, etc. The alignment of a type in WGSL is not always equal to its size, which causes compatibility issues with repr(C) structs. This can be solved using encase to serialize the field values individually while properly handling alignment. Using encase may require additional alignment annotations for some field types such as f32.

WGSL also includes @align(N) and @size(N) attributes that should be considered in the generated Rust types. This can also be handled using corresponding attributes in encase.

Using bytemuck is more efficient but only works correctly when using specific layouts such as using vec4<f32> or mat4x4<f32> for all fields in WGSL. It may be worth still supporting this case or generating some sort of error if a struct used for a uniform or storage buffer in WGSL contains problematic field types.

WebGPU Reference: https://gpuweb.github.io/gpuweb/wgsl/#memory-layouts

Dec 29 '22 04:12 ScanMountGoat

Another challenge to address is whether these memory layout requirements also apply to vertex buffers. Some desktop applications tightly pack elements in a vertex buffer as described in #24. It's unclear if this same packed layout will work properly with storage buffer objects for applications such as transforming vertex data using a compute shader. The requirements may depend on the underlying backend since wgpu can be used with backends other than DX12 or Vulkan. Some APIs like WebGL or OpenGL ES require vertex attribute offsets to be aligned to a multiple of the field size similar to the layout requirements for host shareable types.

Dec 30 '22 22:12 ScanMountGoat

Using encase for vertex buffers may produce slightly different field offsets than the offsets currently being generated using memoffset. This can potentially cause issues when using the same vertex buffer data for storage buffers.

Dec 31 '22 23:12 ScanMountGoat

This is exactly what I've been experimenting with lately. I started using encase, because bytemuck's alignment complaints were getting annoying. And encase's write() does indeed rearrange some struct fields.

Since the serialization format (the bytes inside the vertex buffer itself) and the bytemuck/encase derives are deeply connected, I think the way to go would be to replace the current derive_bytemuck, derive_encase with an enum called SerializationFormat or something, which could be the Rust (i.e. simple structs, bytemuck, offsetof and size), and Encase (with ::SHADER_SIZE, and friends). You might need to get in touch with the encase maintainer, so they can open up the offset calculation somehow (or generate the code from their macro), so that you don't have to duplicate that calculation.

That said, right now encase seems to be the winner for me, because it does seem to guarantee all the alignment and size constraints of wgsl.

Jan 01 '23 11:01 badicsalex

I got a strange validation error using the following struct and bytemuck:

struct Material {
    color: vec3<f32>
}

This code does not work ❌ :

  let material = shader::Material {
        color: nalgebra::Vector3::new(1.0, 1.0, 0.0),
    };
    let uniform_buf = device.create_buffer_init(&wgpu::util::BufferInitDescriptor {
        label: Some("Uniform Buffer"),
        contents: bytemuck::cast_slice(&[material]),
        usage: wgpu::BufferUsages::UNIFORM | wgpu::BufferUsages::COPY_DST,
    });
    let bind_group0 = shader::bind_groups::BindGroup0::from_bindings(
        &device,
        shader::bind_groups::BindGroupLayout0 {
            material: BufferBinding { buffer: &uniform_buf, offset: 0, size: Some(NonZeroU64::new(12).unwrap()) },
        },
    );

thread 'main' panicked at 'wgpu error: Validation Error

Caused by:
    In a RenderPass
      note: encoder = `<CommandBuffer-(0, 1, Vulkan)>`
    In a draw command, indexed:false indirect:false
      note: render pipeline = `Render Pipeline`
    Buffer is bound with size 12 where the shader expects 16 in group[0] compact index 0

But this code works ✅ (with https://github.com/teoxoy/encase/pull/23) :

    let material = shader::Material {
        color: nalgebra::Vector3::new(1.0, 1.0, 0.0),
    };
    let mut buffer = UniformBuffer::new(Vec::new());
    buffer.write(&material).unwrap();
    let bytes = buffer.into_inner();
    let uniform_buf = device.create_buffer_init(&wgpu::util::BufferInitDescriptor {
        label: Some("Uniform Buffer"),
        contents: &bytes,
        usage: wgpu::BufferUsages::UNIFORM | wgpu::BufferUsages::COPY_DST,
    });
    let bind_group0 = shader::bind_groups::BindGroup0::from_bindings(
        &device,
        shader::bind_groups::BindGroupLayout0 {
            material: BufferBinding { buffer: &uniform_buf, offset: 0, size: Some(Material::min_size()) },
        },
    );

Still, my understanding bytemuck::cast_slice() is much more efficient because it just casts the existing data whereas the encase buffer needs to be filled with copied data. Correct?

Feb 07 '23 12:02 JMLX42

But this code works ✅ (with https://github.com/teoxoy/encase/pull/23) :

It is worth mentioning that Material::min_size() == 16 which, AFAIK is the goal when using encase vs bytemuck.

So if we ever wanted to use bytemuck alone and cast the struct directly, that (4 bytes in this case) padding would have to be described in the Rust struct generated by wgsl_to_wgpu. Correct?

Feb 07 '23 13:02 JMLX42

But this code works ✅ (with teoxoy/encase#23) :

encase is writing the data using a custom writer implementation for each type that manages padding and alignment. bytemuck is just casting the slice, which is very efficient but won't always follow WGSL's alignment rules. See the links on the first comment.

It is worth mentioning that Material::min_size() == 16 which, AFAIK is the goal when using encase vs bytemuck.

So if we ever wanted to use bytemuck alone and cast the struct directly, that (4 bytes in this case) padding would have to be described in the Rust struct generated by wgsl_to_wgpu. Correct?

You would need to add the appropriate padding to the Rust struct by either adding fields or just use vec4<f32> and mat4x4<f32> for everything in WGSL to avoid any alignment issues. For an example of how padding works in practice, see the link below. https://gpuweb.github.io/gpuweb/wgsl/#example-fc0bb4df

Feb 07 '23 14:02 ScanMountGoat

It would be great if the generator explicitly disallowed vec3, or automatically added padding, because this seems like a very common pitfall.

Feb 07 '23 17:02 badicsalex

It would be great if the generator explicitly disallowed vec3, or automatically added padding, because this seems like a very common pitfall.

Would it make sense to add private fields of the required size to do that?

Feb 07 '23 18:02 JMLX42

It would be great if the generator explicitly disallowed vec3, or automatically added padding, because this seems like a very common pitfall.

I think disallowing vec3 completely is overly limiting. There are some valid usages of vec3 for structs describing vertex attributes on desktop platforms as discussed in #24. WGSL aligns vec3<f32> for host shareable types for uniform buffers and storage buffers to 16 bytes according to the spec. This is the same size as vec4<f32>, which defeats the point of using 3 components in the first place.

If a type contains vec3 fields and is not used as a vertex input, wgsl_to_wgpu could return some sort of error. You could also make an argument that it's fine as long as the user is deriving encase. There are other cases requiring padding that don't involve types with an alignment greater than their size such as an f32 field followed by a vec4 field. This mostly impacts bytemuck since bytemuck is using the Rust struct's memory layout. I don't know if supporting bytemuck for all structs is a good idea due to the potential pitfalls. One approach is to only forbid types like vec3 when deriving bytemuck.

Feb 09 '23 05:02 ScanMountGoat

Would it make sense to add private fields of the required size to do that?

Private fields would involve adding new functions to each type to construct it. I'd like to keep each generated struct as plain data if possible.

Feb 09 '23 05:02 ScanMountGoat

One approach is to only forbid types like vec3 when deriving bytemuck

Won't we have the exact problem with array<f32, 3> (just an example)?

Feb 09 '23 14:02 JMLX42

The main problem is that the offsets of the Rust struct's fields won't always match the expected offsets of the WGSL types. This can also affect the expected size of the struct in some cases. Some types like vec3 are more likely to cause mismatches than others. There are a finite number of WGSL types and the generated Rust types are known, so we could validate the Rust struct layout in wgsl_to_wgpu. Deriving encase should solve the problem of layout, so there shouldn't be any need to validate the types that only derive encase. Types that derive bytemuck will need to be checked. Not enabling bytemuck or encase is an interesting case that may need to be handled in some way.

Layout mismatches can be handled by automatically adding padding to the generated Rust struct or returning some sort of error. While nice in theory, I'm not sure how beneficial it is to add padding automatically to the Rust struct definitions. If a user has vec3<f32> in the shader and wgsl_to_wgpu adds an additional f32 worth of padding, the user could just have easily changed the WGSL struct to use vec4<f32> instead.

Feb 10 '23 04:02 ScanMountGoat

The latest commit uses const assertions to compare the field offsets and size of each generated Rust struct with what WGSL expects. The assertions are only generated when enabling bytemuck for now. I still need to add some configuration for how vertex input structs are handled before I would consider this solved. The expected WGSL layout is calculated using naga::proc::Layouter, so it should be accurate.

A WGSL struct like this will generate an error when you try to compile the program.

struct Material {
    color: vec3<f32>
}

error[E0080]: evaluation of constant value failed
  --> example\src\shader.rs:20:15
   |
20 |   const _: () = assert!(
   |  _______________^
21 | |     std::mem::size_of:: < Material > () == 16, "size of Material does not match WGSL"
22 | | );
   | |_^ the evaluated program panicked at 'size of Material does not match WGSL', example\src\shader.rs:20:15
   |

Feb 10 '23 21:02 ScanMountGoat

Marking this as completed since the layout is validated for the required types on desktop now. I may look into potentially stricter alignments requirements for vertex inputs on the web, but that can be a separate issue later.

Apr 12 '23 21:04 ScanMountGoat

wgsl_to_wgpu wgsl_to_wgpu copied to clipboard

WGSL structure alignment

wgsl_to_wgpu
wgsl_to_wgpu copied to clipboard