wgpu Naga mesh shader SPIR-V writer

Connections Works towards #7197 Builds on #8370

Description

Add a SPIR-V writer for mesh shaders

All major changes here are in the SPIR-V backend for naga and naga snapshots. Other "changes" are inherited from #8370

Testing

Mesh shader test WGSL is now written to SPIR-V as a snapshot. Mesh shader tests & example use naga-written spirv on vulkan backend.

Squash or Rebase? Squash

Checklist

[x] Run cargo fmt.
[x] Run taplo format.
[x] Run cargo clippy --tests. If applicable, add:
- [x] --target wasm32-unknown-unknown
[x] Run cargo xtask test to run tests.
[x] If this contains user-facing changes, add a CHANGELOG.md entry.

Oct 31 '25 22:10 inner-daemons

Holdup what did I do

Nov 15 '25 04:11 inner-daemons

is this almost done ?

Nov 15 '25 07:11 amarhassam

@amarhassam

Haha yeah its basically done. I merged trunk a while back and now Im getting errors I can't understand (see commit log) but I will no doubt figure it out. I think @cwfitzgerald will get to reviewing it on the 26th, but it'll probably arrive after that.

Nov 15 '25 07:11 inner-daemons

@inner-daemons so its mainly you who is working on mesh shaders ?

Nov 15 '25 07:11 amarhassam

@amarhassam Yeah, I try to post updates to #7197 every now and then, and you can track the status there. I've had 2 people offer to write the HLSL writer and WGSL writer, but not much progress has been made on either I think. Otherwise everything is just me

Nov 15 '25 07:11 inner-daemons

@cwfitzgerald All comments should be addressed, ready for round 2! (=2)

Nov 26 '25 20:11 inner-daemons

I'll work on fixing the tests... at some point

Nov 27 '25 01:11 inner-daemons

For future me to enjoy and ponder:

	<call no='157' class='pipe_context' method='create_fs_state'>
		<arg name='pipe'><ptr>0x78e9e40b7010</ptr></arg>
		<arg name='state'><struct name='pipe_shader_state'><member name='type'><uint>1</uint></member><member name='tokens'><null/></member><member name='ir'><string><![CDATA[shader: MESA_SHADER_FRAGMENT
source_blake3: {0x36b56488, 0x1d19aa4b, 0xcbbf732d, 0x58f338fc, 0x4ff49005, 0x051203b8, 0x5a5ae0b8, 0x945f3dc4}
inputs_read: 32-33
outputs_written: 4
per_primitive_inputs: 33
subgroup_size: 0
bit_sizes_float: 0x20
bit_sizes_int: 0x20
origin_upper_left: true
inputs: 2
outputs: 1
decl_var shader_in INTERP_MODE_NONE none vec4 color (VARYING_SLOT_VAR0.xyzw, 0, 0)
decl_var per_primitive shader_in INTERP_MODE_NONE none vec4 colorMask (VARYING_SLOT_VAR1.xyzw, 1, 0)
decl_var shader_out INTERP_MODE_NONE none vec4 #0 (FRAG_RESULT_DATA0.xyzw, 0, 0)
decl_function fs_main () (entrypoint)

impl fs_main {
    block b0:   // preds:
    32     %0 = deref_var &color (shader_in vec4)
    32x4   %1 = @load_deref (%0) (access=none)
    32     %2 = deref_var &colorMask (shader_in vec4)
    32x4   %3 = @load_deref (%2) (access=none)
    32     %4 = fmul %1.x, %3.x
    32     %5 = fmul %1.y, %3.y
    32     %6 = fmul %1.z, %3.z
    32     %7 = fmul %1.w, %3.w
    32     %8 = deref_var &#0 (shader_out vec4)
    32x4   %9 = vec4 %4, %5, %6, %7
                @store_deref (%8, %9) (wrmask=xyzw, access=none)
                // succs: b1
    block b1:
}

]]></string></member><member name='stream_output'><struct name='pipe_stream_output_info'><member name='num_outputs'><uint>0</uint></member><member name='stride'><array><elem><uint>0</uint></elem><elem><uint>0</uint></elem><elem><uint>0</uint></elem><elem><uint>0</uint></elem></array></member><member name='output'><array></array></member></struct></member></struct></arg>
		<ret><ptr>0x78e9e1da32f0</ptr></ret>
		<time><int>72</int></time>
	</call>
	<call no='158' class='pipe_context' method='create_ts_state'>
		<arg name='pipe'><ptr>0x78e9e40b7010</ptr></arg>
		<arg name='state'><struct name='pipe_shader_state'><member name='type'><uint>1</uint></member><member name='tokens'><null/></member><member name='ir'><string><![CDATA[shader: MESA_SHADER_TASK
source_blake3: {0xa93be064, 0x9f407d12, 0x5b23394f, 0xc5a9b7f6, 0x9146aa9a, 0xd2190e79, 0x0df20856, 0xdec646c8}
workgroup_size: 1, 1, 1
shared_size: 4
task_payload_size: 20
subgroup_size: 0
Unhandled stage 6
decl_var task_payload INTERP_MODE_NONE none TaskPayload taskPayload
decl_var shared INTERP_MODE_NONE none float workgroupData = null
decl_function ts_main () (entrypoint)

impl ts_main {
    block b0:  // preds:
    32    %0 = load_const (0x3f800000 = 1.000000 = 1065353216)
    32    %1 = load_const (0x00000000)
               @store_shared (%0 (0x3f800000), %1 (0x0)) (base=0, wrmask=x, align_mul=256, align_offset=0)
    32x4  %2 = load_const (0x3f800000, 0x3f800000, 0x00000000, 0x3f800000) = (1.000000, 1.000000, 0.000000, 1.000000) = (1065353216, 1065353216, 0, 1065353216)
               @store_task_payload (%2 (0x3f800000, 0x3f800000, 0x0, 0x3f800000), %1 (0x0)) (base=0, wrmask=xyzw, align_mul=256, align_offset=0)
    32    %3 = load_const (0x00000010 = 16)
    32    %4 = load_const (0x00000001 = 0.000000)
               @store_task_payload (%4 (0x1), %3 (0x10)) (base=0, wrmask=x, align_mul=256, align_offset=16)
    32x3  %5 = load_const (0x00000003, 0x00000001, 0x00000001) = (0.000000, 0.000000, 0.000000)
               @launch_mesh_workgroups (%5 (0x3, 0x1, 0x1)) (base=0, range=20)
               // succs: b1
    block b1:
}

]]></string></member><member name='stream_output'><struct name='pipe_stream_output_info'><member name='num_outputs'><uint>0</uint></member><member name='stride'><array><elem><uint>0</uint></elem><elem><uint>0</uint></elem><elem><uint>0</uint></elem><elem><uint>0</uint></elem></array></member><member name='output'><array></array></member></struct></member></struct></arg>
		<ret><ptr>0x78e9e1d87610</ptr></ret>
		<time><int>25</int></time>
	</call>
	<call no='159' class='pipe_context' method='create_ms_state'>
		<arg name='pipe'><ptr>0x78e9e40b7010</ptr></arg>
		<arg name='state'><struct name='pipe_shader_state'><member name='type'><uint>1</uint></member><member name='tokens'><null/></member><member name='ir'><string><![CDATA[shader: MESA_SHADER_MESH
source_blake3: {0xa41a4a8c, 0xdb892a32, 0x86c55f33, 0x112e676c, 0xd27ecff8, 0x46a99a32, 0x1337b67d, 0xcc785c3c}
workgroup_size: 1, 1, 1
outputs_written: 0,27-28,32-33
system_values_read: 0x00000000'00000000'00000800'00000000
per_primitive_outputs: 27-28,33
shared_size: 140
task_payload_size: 20
subgroup_size: 0
bit_sizes_int: 0x21
ms_cross_invocation_output_access: 0,27-28,32-33
max_vertices_out: 3
max_primitives_out: 1
primitive_type: TRIANGLES
outputs: 5
decl_var shader_out INTERP_MODE_NONE none vec4[3] naga_vertex_builtin_outputs[*].field0 (VARYING_SLOT_POS.xyzw, 0, 0)
decl_var per_primitive shader_out INTERP_MODE_NONE none uvec3[1] naga_primitive_indices_outputs (VARYING_SLOT_PRIMITIVE_INDICES.xyz, 2, 0)
decl_var per_primitive shader_out INTERP_MODE_NONE none bool[1] naga_primitive_builtin_outputs[*].field0 (VARYING_SLOT_CULL_PRIMITIVE.x, 3, 0)
decl_var shader_out INTERP_MODE_NONE none vec4[3] #0 (VARYING_SLOT_VAR0.xyzw, 1, 0)
decl_var per_primitive shader_out INTERP_MODE_NONE none vec4[1] #1 (VARYING_SLOT_VAR1.xyzw, 4, 0)
decl_var task_payload INTERP_MODE_NONE none TaskPayload taskPayload
decl_var shared INTERP_MODE_NONE none float workgroupData = null
decl_var shared INTERP_MODE_NONE none MeshOutput mesh_output = null
decl_function ms_main () (entrypoint)

impl ms_main {
    block b0:   // preds:
    32     %0 = undefined
    32     %1 = @load_local_invocation_index
    32     %2 = load_const (0x00000004)
    32     %3 = load_const (0x00000084 = 132)
    32     %4 = load_const (0x00000003)
                @store_shared (%4 (0x3), %3 (0x84)) (base=0, wrmask=x, align_mul=256, align_offset=132)
    32     %5 = load_const (0x00000088 = 136)
    32     %6 = load_const (0x00000001)
                @store_shared (%6 (0x1), %5 (0x88)) (base=0, wrmask=x, align_mul=256, align_offset=136)
    32     %7 = load_const (0x40000000 = 2.000000 = 1073741824)
    32     %8 = load_const (0x00000000)
                @store_shared (%7 (0x40000000), %8 (0x0)) (base=0, wrmask=x, align_mul=256, align_offset=0)
    32     %9 = load_const (0x00000005)
    32x4  %10 = load_const (0x00000000, 0x3f800000, 0x00000000, 0x3f800000) = (0.000000, 1.000000, 0.000000, 1.000000) = (0, 1065353216, 0, 1065353216)
                @store_shared (%10 (0x0, 0x3f800000, 0x0, 0x3f800000), %2 (0x4)) (base=0, wrmask=xyzw, align_mul=256, align_offset=4)
    32x4  %11 = @load_task_payload (%8 (0x0)) (base=0, align_mul=256, align_offset=0)
    32x4  %12 = vec4 %8 (0x0), %11.y, %8 (0x0), %11.w
    32    %13 = load_const (0x00000010 = 16)
    32    %14 = load_const (0x00000014 = 20)
                @store_shared (%12, %14 (0x14)) (base=0, wrmask=xyzw, align_mul=256, align_offset=20)
    32    %15 = load_const (0x00000024 = 36)
    32x4  %16 = load_const (0xbf800000, 0xbf800000, 0x00000000, 0x3f800000) = (-1.000000, -1.000000, 0.000000, 1.000000) = (-1082130432, -1082130432, +0, +1065353216) = (3212836864, 3212836864, 0, 1065353216)
                @store_shared (%16 (0xbf800000, 0xbf800000, 0x0, 0x3f800000), %15 (0x24)) (base=0, wrmask=xyzw, align_mul=256, align_offset=36)
    32x4  %17 = vec4 %8 (0x0), %8 (0x0), %11.z, %11.w
    32    %18 = load_const (0x00000034 = 52)
                @store_shared (%17, %18 (0x34)) (base=0, wrmask=xyzw, align_mul=256, align_offset=52)
    32    %19 = load_const (0x00000044 = 68)
    32x4  %20 = load_const (0x3f800000, 0xbf800000, 0x00000000, 0x3f800000) = (1.000000, -1.000000, 0.000000, 1.000000) = (+1065353216, -1082130432, +0, +1065353216) = (1065353216, 3212836864, 0, 1065353216)
                @store_shared (%20 (0x3f800000, 0xbf800000, 0x0, 0x3f800000), %19 (0x44)) (base=0, wrmask=xyzw, align_mul=256, align_offset=68)
    32x4  %21 = vec4 %11.x, %8 (0x0), %8 (0x0), %11.w
    32    %22 = load_const (0x00000054 = 84)
                @store_shared (%21, %22 (0x54)) (base=0, wrmask=xyzw, align_mul=256, align_offset=84)
    32    %23 = load_const (0x00000064 = 100)
    32x3  %24 = load_const (0x00000000, 0x00000001, 0x00000002) = (0.000000, 0.000000, 0.000000)
                @store_shared (%24 (0x0, 0x1, 0x2), %23 (0x64)) (base=0, wrmask=xyz, align_mul=256, align_offset=100)
    32    %25 = @load_task_payload (%13 (0x10)) (base=0, align_mul=256, align_offset=16)
    1     %26 = ieq %25, %8 (0x0)
    32    %27 = load_const (0x00000070 = 112)
    32    %28 = b2b32 %26
                @store_shared (%28, %27 (0x70)) (base=0, wrmask=x, align_mul=256, align_offset=112)
    32    %29 = load_const (0x00000074 = 116)
    32x4  %30 = load_const (0x3f800000, 0x00000000, 0x3f800000, 0x3f800000) = (1.000000, 0.000000, 1.000000, 1.000000) = (1065353216, 0, 1065353216, 1065353216)
                @store_shared (%30 (0x3f800000, 0x0, 0x3f800000, 0x3f800000), %29 (0x74)) (base=0, wrmask=xyzw, align_mul=256, align_offset=116)
                @barrier (execution_scope=WORKGROUP, memory_scope=WORKGROUP, mem_semantics=ACQ|REL, mem_modes=shader_out|shared)
    32    %31 = @load_shared (%3 (0x84)) (base=0, align_mul=256, align_offset=132)
    32    %32 = umin %31, %4 (0x3)
    32    %33 = @load_shared (%5 (0x88)) (base=0, align_mul=256, align_offset=136)
    32    %34 = umin %33, %6 (0x1)
                @set_vertex_and_primitive_count (%32, %34, %0) (stream_id=0)
                // succs: b1
    loop {
        block b1:   // preds: b0 b4
        32    %35 = phi b0: %1, b4: %46
        1     %36 = uge %35, %32
                    // succs: b2 b3
        if %36 {
            block b2:// preds: b1
            break
            // succs: b5
        } else {
            block b3:  // preds: b1, succs: b4
        }
        block b4:   // preds: b3
        32    %37 = ishl %35, %9 (0x5)
        32    %38 = iadd %2 (0x4), %37
        32x4  %39 = @load_shared (%38) (base=0, align_mul=32, align_offset=4)
        32    %40 = iadd %14 (0x14), %37
        32x4  %41 = @load_shared (%40) (base=0, align_mul=32, align_offset=20)
        32    %42 = deref_var &naga_vertex_builtin_outputs[*].field0 (shader_out vec4[3])
        32    %43 = deref_array &(*%42)[%35] (shader_out vec4)  // &naga_vertex_builtin_outputs[*].field0[%35]
                    @store_deref (%43, %39) (wrmask=xyzw, access=none)
        32    %44 = deref_var &#0 (shader_out vec4[3])
        32    %45 = deref_array &(*%44)[%35] (shader_out vec4)  // &#0[%35]
                    @store_deref (%45, %41) (wrmask=xyzw, access=none)
        32    %46 = iadd %35, %6 (0x1)
                    // succs: b1
    }
    block b5:  // preds: b2, succs: b6
    loop {
        block b6:   // preds: b5 b9
        32    %47 = phi b5: %1, b9: %63
        1     %48 = uge %47, %34
                    // succs: b7 b8
        if %48 {
            block b7:// preds: b6
            break
            // succs: b10
        } else {
            block b8:  // preds: b6, succs: b9
        }
        block b9:   // preds: b8
        32    %49 = ishl %47, %9 (0x5)
        32    %50 = iadd %23 (0x64), %49
        32x3  %51 = @load_shared (%50) (base=0, align_mul=32, align_offset=4)
        32    %52 = iadd %27 (0x70), %49
        32    %53 = @load_shared (%52) (base=0, align_mul=32, align_offset=16)
        1     %54 = b2b1 %53
        32    %55 = iadd %29 (0x74), %49
        32x4  %56 = @load_shared (%55) (base=0, align_mul=32, align_offset=20)
        32    %57 = deref_var &naga_primitive_indices_outputs (shader_out uvec3[1])
        32    %58 = deref_array &(*%57)[%47] (shader_out uvec3)  // &naga_primitive_indices_outputs[%47]
                    @store_deref (%58, %51) (wrmask=xyz, access=none)
        32    %59 = deref_var &naga_primitive_builtin_outputs[*].field0 (shader_out bool[1])
        32    %60 = deref_array &(*%59)[%47] (shader_out bool)  // &naga_primitive_builtin_outputs[*].field0[%47]
                    @store_deref (%60, %54) (wrmask=x, access=none)
        32    %61 = deref_var &#1 (shader_out vec4[1])
        32    %62 = deref_array &(*%61)[%47] (shader_out vec4)  // &#1[%47]
                    @store_deref (%62, %56) (wrmask=xyzw, access=none)
        32    %63 = iadd %47, %6 (0x1)
                    // succs: b6
    }
    block b10:  // preds: b7, succs: b11
    block b11:
}

]]>

Nov 27 '25 23:11 inner-daemons

I also have some AMD GPU code from windows that I'll upload at some point but its absolutely unreadable to my eyes.

Nov 27 '25 23:11 inner-daemons

For future reference: I have taken the generated code with debug symbols and put it through SPIRV-Opt, I have put it through SPIRV-Cross and then back through glslc, and both times it still caused a bug in LLVMPipe. Not sure about AMD however.

I have also narrowed it down to just being the mesh shader part.

Nov 28 '25 04:11 inner-daemons

Update: got this to work by modifying the body of main() in the generated GLSL. Therefore, the interface is fine, and its just an issue with something goofy.

Nov 28 '25 04:11 inner-daemons

These comments are spammy if anyone is listening so you don't need to

I have narrowed it down to this little bit of code that breaks stuff even if the values are immediately rewritten:

  bool cull = mesh_output.primitives[gl_LocalInvocationIndex].cull;
  gl_MeshPrimitivesEXT[0].gl_CullPrimitiveEXT = cull;

Notably, it has to be referenced by gl_LocalInvocationIndex to break, even though that should always be 0. It still results in the following error:

(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[1]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[1]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[1]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[1]: Register never used(null)[0]: Register never used(null)[0]: Register never usedLLVM ERROR: Cannot emit physreg copy instruction

I'm beginning to believe this really is a mesa bug

Nov 28 '25 04:11 inner-daemons

The following when replacing the body of the generated GLSL makes it terminate with sigsegv:

void main() {
  mesh_output.primitives[0u].cull = !taskPayload.visible;

  SetMeshOutputsEXT(3, 1);

  bool cull = mesh_output.primitives[gl_LocalInvocationIndex].cull;
  gl_MeshPrimitivesEXT[0].gl_CullPrimitiveEXT = cull;

  gl_MeshVerticesEXT[0].gl_Position = vec4(0.0, 1.0, 0.0, 1.0);
  gl_MeshVerticesEXT[1].gl_Position = vec4(-1.0, -1.0, 0.0, 1.0);
  gl_MeshVerticesEXT[2].gl_Position = vec4(1.0, -1.0, 0.0, 1.0);

  _43[0]._m0 = vec4(0.0, 1.0, 0.0, 1.0) * taskPayload.colorMask;
  _43[1]._m0 = vec4(0.0, 0.0, 1.0, 1.0) * taskPayload.colorMask;
  _43[2]._m0 = vec4(1.0, 0.0, 0.0, 1.0) * taskPayload.colorMask;

  gl_PrimitiveTriangleIndicesEXT[0] = uvec3(0u, 1u, 2u);
  _50[0]._m0 = vec4(1.0, 0.0, 1.0, 1.0);
  gl_MeshPrimitivesEXT[0].gl_CullPrimitiveEXT = false;
}

It doesn't terminate when gl_LocalInvocationIndex isn't used to index mesh_output.primitives, even if it is used elsewhere

Nov 28 '25 05:11 inner-daemons

#version 450
#extension GL_EXT_mesh_shader : require
layout(local_size_x = 1, local_size_y = 1, local_size_z = 1) in;
layout(max_vertices = 3, max_primitives = 1, triangles) out;

struct TaskPayload {
  vec4 colorMask;
  bool visible;
};

out _40 { layout(location = 0) vec4 _m0; }
_43[3];

perprimitiveEXT out _47 { layout(location = 1) vec4 _m0; }
_50[1];

taskPayloadSharedEXT TaskPayload taskPayload;

shared bool array[3];

void main() {
  array[0] = !taskPayload.visible;
  bool cull = array[gl_LocalInvocationIndex];
  gl_MeshPrimitivesEXT[0].gl_CullPrimitiveEXT = cull;

  SetMeshOutputsEXT(3, 1);

  gl_MeshVerticesEXT[0].gl_Position = vec4(0.0, 1.0, 0.0, 1.0);
  gl_MeshVerticesEXT[1].gl_Position = vec4(-1.0, -1.0, 0.0, 1.0);
  gl_MeshVerticesEXT[2].gl_Position = vec4(1.0, -1.0, 0.0, 1.0);

  _43[0]._m0 = vec4(0.0, 1.0, 0.0, 1.0) * taskPayload.colorMask;
  _43[1]._m0 = vec4(0.0, 0.0, 1.0, 1.0) * taskPayload.colorMask;
  _43[2]._m0 = vec4(1.0, 0.0, 0.0, 1.0) * taskPayload.colorMask;

  gl_PrimitiveTriangleIndicesEXT[0] = uvec3(0u, 1u, 2u);
  _50[0]._m0 = vec4(1.0, 0.0, 1.0, 1.0);
}

This example is broken but only because of the first 3 lines in main: if you chain the indexing to use 0 it works fine, and if you remove it it works fine.

Nov 28 '25 05:11 inner-daemons

If I change naga's logic to use


        // Current index to copy
        let val_i = if is_primitive {
            self.get_constant_scalar(crate::Literal::U32(0))
        } else {
            let val_i = self.id_gen.next();
            body.push(Instruction::load(u32_type_id, val_i, index_var, None));
            val_i
        };

when deciding which index to copy over, it actually passes all of the tests except for 2: wgpu_gpu::mesh_shader::mesh_pipeline_basic_mesh and wgpu_gpu::mesh_shader::mesh_pipeline_basic_mesh_frag both consistently segfault while nothing else does.

So in conclusion I have no fucking clue what's going on.

Nov 28 '25 05:11 inner-daemons

This shader

#version 450
#extension GL_EXT_mesh_shader : require

const vec4[3] positions = {vec4(0., 1.0, 0., 1.0), vec4(-1.0, -1.0, 0., 1.0),
                           vec4(1.0, -1.0, 0., 1.0)};
const vec4[3] colors = {vec4(0., 1., 0., 1.), vec4(0., 0., 1., 1.),
                        vec4(1., 0., 0., 1.)};

// This is an inefficient workgroup size.Ideally the total thread count would be
// a multiple of 64
layout(local_size_x = 1, local_size_y = 1, local_size_z = 1) in;
struct PayloadData {
  vec4 colorMask;
  bool visible;
};
taskPayloadSharedEXT PayloadData payloadData;

out VertexOutput { layout(location = 0) vec4 color; }
vertexOutput[];
layout(location = 1) perprimitiveEXT out PrimitiveOutput { vec4 colorMask; }
primitiveOutput[];

layout(triangles, max_vertices = 3, max_primitives = 1) out;

shared bool array[3];

void main() {
  SetMeshOutputsEXT(3, 1);

  gl_MeshVerticesEXT[0].gl_Position = positions[0];
  gl_MeshVerticesEXT[1].gl_Position = positions[1];
  gl_MeshVerticesEXT[2].gl_Position = positions[2];

  vertexOutput[0].color = colors[0] * payloadData.colorMask;
  vertexOutput[1].color = colors[1] * payloadData.colorMask;
  vertexOutput[2].color = colors[2] * payloadData.colorMask;

  gl_PrimitiveTriangleIndicesEXT[gl_LocalInvocationIndex] = uvec3(0, 1, 2);
  primitiveOutput[0].colorMask = vec4(1.0, 0.0, 1.0, 1.0);

  array[0] = !payloadData.visible;
  bool cull = array[gl_LocalInvocationIndex];
  gl_MeshPrimitivesEXT[0].gl_CullPrimitiveEXT = cull;
}

Which is a copy-paste of the fully functional SPIR-V that works on NVIDIA, AMD, and LLVMPIPE, except for the shared bool array[3]; and

  array[0] = !payloadData.visible;
  bool cull = array[gl_LocalInvocationIndex];
  gl_MeshPrimitivesEXT[0].gl_CullPrimitiveEXT = cull;

fails to run on LLVMPIPE. LLVMPIPE segfaults here.

Nov 28 '25 05:11 inner-daemons

So that I remember: the tests started failing between 4b7ba3adc and 65be73b982dd591201dddf8225767b9d2faff7bd

Nov 28 '25 07:11 inner-daemons