wgpu icon indicating copy to clipboard operation
wgpu copied to clipboard

Naga mesh shader SPIR-V writer

Open inner-daemons opened this issue 1 month ago • 17 comments

Connections Works towards #7197 Builds on #8370

Description

Add a SPIR-V writer for mesh shaders

All major changes here are in the SPIR-V backend for naga and naga snapshots. Other "changes" are inherited from #8370

Testing

Mesh shader test WGSL is now written to SPIR-V as a snapshot. Mesh shader tests & example use naga-written spirv on vulkan backend.

Squash or Rebase? Squash

Checklist

  • [x] Run cargo fmt.
  • [x] Run taplo format.
  • [x] Run cargo clippy --tests. If applicable, add:
    • [x] --target wasm32-unknown-unknown
  • [x] Run cargo xtask test to run tests.
  • [x] If this contains user-facing changes, add a CHANGELOG.md entry.

inner-daemons avatar Oct 31 '25 22:10 inner-daemons

Holdup what did I do

inner-daemons avatar Nov 15 '25 04:11 inner-daemons

is this almost done ?

amarhassam avatar Nov 15 '25 07:11 amarhassam

@amarhassam

Haha yeah its basically done. I merged trunk a while back and now Im getting errors I can't understand (see commit log) but I will no doubt figure it out. I think @cwfitzgerald will get to reviewing it on the 26th, but it'll probably arrive after that.

inner-daemons avatar Nov 15 '25 07:11 inner-daemons

@inner-daemons so its mainly you who is working on mesh shaders ?

amarhassam avatar Nov 15 '25 07:11 amarhassam

@amarhassam Yeah, I try to post updates to #7197 every now and then, and you can track the status there. I've had 2 people offer to write the HLSL writer and WGSL writer, but not much progress has been made on either I think. Otherwise everything is just me

inner-daemons avatar Nov 15 '25 07:11 inner-daemons

@cwfitzgerald All comments should be addressed, ready for round 2! (=2)

inner-daemons avatar Nov 26 '25 20:11 inner-daemons

I'll work on fixing the tests... at some point

inner-daemons avatar Nov 27 '25 01:11 inner-daemons

For future me to enjoy and ponder:

	<call no='157' class='pipe_context' method='create_fs_state'>
		<arg name='pipe'><ptr>0x78e9e40b7010</ptr></arg>
		<arg name='state'><struct name='pipe_shader_state'><member name='type'><uint>1</uint></member><member name='tokens'><null/></member><member name='ir'><string><![CDATA[shader: MESA_SHADER_FRAGMENT
source_blake3: {0x36b56488, 0x1d19aa4b, 0xcbbf732d, 0x58f338fc, 0x4ff49005, 0x051203b8, 0x5a5ae0b8, 0x945f3dc4}
inputs_read: 32-33
outputs_written: 4
per_primitive_inputs: 33
subgroup_size: 0
bit_sizes_float: 0x20
bit_sizes_int: 0x20
origin_upper_left: true
inputs: 2
outputs: 1
decl_var shader_in INTERP_MODE_NONE none vec4 color (VARYING_SLOT_VAR0.xyzw, 0, 0)
decl_var per_primitive shader_in INTERP_MODE_NONE none vec4 colorMask (VARYING_SLOT_VAR1.xyzw, 1, 0)
decl_var shader_out INTERP_MODE_NONE none vec4 #0 (FRAG_RESULT_DATA0.xyzw, 0, 0)
decl_function fs_main () (entrypoint)

impl fs_main {
    block b0:   // preds:
    32     %0 = deref_var &color (shader_in vec4)
    32x4   %1 = @load_deref (%0) (access=none)
    32     %2 = deref_var &colorMask (shader_in vec4)
    32x4   %3 = @load_deref (%2) (access=none)
    32     %4 = fmul %1.x, %3.x
    32     %5 = fmul %1.y, %3.y
    32     %6 = fmul %1.z, %3.z
    32     %7 = fmul %1.w, %3.w
    32     %8 = deref_var &#0 (shader_out vec4)
    32x4   %9 = vec4 %4, %5, %6, %7
                @store_deref (%8, %9) (wrmask=xyzw, access=none)
                // succs: b1
    block b1:
}

]]></string></member><member name='stream_output'><struct name='pipe_stream_output_info'><member name='num_outputs'><uint>0</uint></member><member name='stride'><array><elem><uint>0</uint></elem><elem><uint>0</uint></elem><elem><uint>0</uint></elem><elem><uint>0</uint></elem></array></member><member name='output'><array></array></member></struct></member></struct></arg>
		<ret><ptr>0x78e9e1da32f0</ptr></ret>
		<time><int>72</int></time>
	</call>
	<call no='158' class='pipe_context' method='create_ts_state'>
		<arg name='pipe'><ptr>0x78e9e40b7010</ptr></arg>
		<arg name='state'><struct name='pipe_shader_state'><member name='type'><uint>1</uint></member><member name='tokens'><null/></member><member name='ir'><string><![CDATA[shader: MESA_SHADER_TASK
source_blake3: {0xa93be064, 0x9f407d12, 0x5b23394f, 0xc5a9b7f6, 0x9146aa9a, 0xd2190e79, 0x0df20856, 0xdec646c8}
workgroup_size: 1, 1, 1
shared_size: 4
task_payload_size: 20
subgroup_size: 0
Unhandled stage 6
decl_var task_payload INTERP_MODE_NONE none TaskPayload taskPayload
decl_var shared INTERP_MODE_NONE none float workgroupData = null
decl_function ts_main () (entrypoint)

impl ts_main {
    block b0:  // preds:
    32    %0 = load_const (0x3f800000 = 1.000000 = 1065353216)
    32    %1 = load_const (0x00000000)
               @store_shared (%0 (0x3f800000), %1 (0x0)) (base=0, wrmask=x, align_mul=256, align_offset=0)
    32x4  %2 = load_const (0x3f800000, 0x3f800000, 0x00000000, 0x3f800000) = (1.000000, 1.000000, 0.000000, 1.000000) = (1065353216, 1065353216, 0, 1065353216)
               @store_task_payload (%2 (0x3f800000, 0x3f800000, 0x0, 0x3f800000), %1 (0x0)) (base=0, wrmask=xyzw, align_mul=256, align_offset=0)
    32    %3 = load_const (0x00000010 = 16)
    32    %4 = load_const (0x00000001 = 0.000000)
               @store_task_payload (%4 (0x1), %3 (0x10)) (base=0, wrmask=x, align_mul=256, align_offset=16)
    32x3  %5 = load_const (0x00000003, 0x00000001, 0x00000001) = (0.000000, 0.000000, 0.000000)
               @launch_mesh_workgroups (%5 (0x3, 0x1, 0x1)) (base=0, range=20)
               // succs: b1
    block b1:
}

]]></string></member><member name='stream_output'><struct name='pipe_stream_output_info'><member name='num_outputs'><uint>0</uint></member><member name='stride'><array><elem><uint>0</uint></elem><elem><uint>0</uint></elem><elem><uint>0</uint></elem><elem><uint>0</uint></elem></array></member><member name='output'><array></array></member></struct></member></struct></arg>
		<ret><ptr>0x78e9e1d87610</ptr></ret>
		<time><int>25</int></time>
	</call>
	<call no='159' class='pipe_context' method='create_ms_state'>
		<arg name='pipe'><ptr>0x78e9e40b7010</ptr></arg>
		<arg name='state'><struct name='pipe_shader_state'><member name='type'><uint>1</uint></member><member name='tokens'><null/></member><member name='ir'><string><![CDATA[shader: MESA_SHADER_MESH
source_blake3: {0xa41a4a8c, 0xdb892a32, 0x86c55f33, 0x112e676c, 0xd27ecff8, 0x46a99a32, 0x1337b67d, 0xcc785c3c}
workgroup_size: 1, 1, 1
outputs_written: 0,27-28,32-33
system_values_read: 0x00000000'00000000'00000800'00000000
per_primitive_outputs: 27-28,33
shared_size: 140
task_payload_size: 20
subgroup_size: 0
bit_sizes_int: 0x21
ms_cross_invocation_output_access: 0,27-28,32-33
max_vertices_out: 3
max_primitives_out: 1
primitive_type: TRIANGLES
outputs: 5
decl_var shader_out INTERP_MODE_NONE none vec4[3] naga_vertex_builtin_outputs[*].field0 (VARYING_SLOT_POS.xyzw, 0, 0)
decl_var per_primitive shader_out INTERP_MODE_NONE none uvec3[1] naga_primitive_indices_outputs (VARYING_SLOT_PRIMITIVE_INDICES.xyz, 2, 0)
decl_var per_primitive shader_out INTERP_MODE_NONE none bool[1] naga_primitive_builtin_outputs[*].field0 (VARYING_SLOT_CULL_PRIMITIVE.x, 3, 0)
decl_var shader_out INTERP_MODE_NONE none vec4[3] #0 (VARYING_SLOT_VAR0.xyzw, 1, 0)
decl_var per_primitive shader_out INTERP_MODE_NONE none vec4[1] #1 (VARYING_SLOT_VAR1.xyzw, 4, 0)
decl_var task_payload INTERP_MODE_NONE none TaskPayload taskPayload
decl_var shared INTERP_MODE_NONE none float workgroupData = null
decl_var shared INTERP_MODE_NONE none MeshOutput mesh_output = null
decl_function ms_main () (entrypoint)

impl ms_main {
    block b0:   // preds:
    32     %0 = undefined
    32     %1 = @load_local_invocation_index
    32     %2 = load_const (0x00000004)
    32     %3 = load_const (0x00000084 = 132)
    32     %4 = load_const (0x00000003)
                @store_shared (%4 (0x3), %3 (0x84)) (base=0, wrmask=x, align_mul=256, align_offset=132)
    32     %5 = load_const (0x00000088 = 136)
    32     %6 = load_const (0x00000001)
                @store_shared (%6 (0x1), %5 (0x88)) (base=0, wrmask=x, align_mul=256, align_offset=136)
    32     %7 = load_const (0x40000000 = 2.000000 = 1073741824)
    32     %8 = load_const (0x00000000)
                @store_shared (%7 (0x40000000), %8 (0x0)) (base=0, wrmask=x, align_mul=256, align_offset=0)
    32     %9 = load_const (0x00000005)
    32x4  %10 = load_const (0x00000000, 0x3f800000, 0x00000000, 0x3f800000) = (0.000000, 1.000000, 0.000000, 1.000000) = (0, 1065353216, 0, 1065353216)
                @store_shared (%10 (0x0, 0x3f800000, 0x0, 0x3f800000), %2 (0x4)) (base=0, wrmask=xyzw, align_mul=256, align_offset=4)
    32x4  %11 = @load_task_payload (%8 (0x0)) (base=0, align_mul=256, align_offset=0)
    32x4  %12 = vec4 %8 (0x0), %11.y, %8 (0x0), %11.w
    32    %13 = load_const (0x00000010 = 16)
    32    %14 = load_const (0x00000014 = 20)
                @store_shared (%12, %14 (0x14)) (base=0, wrmask=xyzw, align_mul=256, align_offset=20)
    32    %15 = load_const (0x00000024 = 36)
    32x4  %16 = load_const (0xbf800000, 0xbf800000, 0x00000000, 0x3f800000) = (-1.000000, -1.000000, 0.000000, 1.000000) = (-1082130432, -1082130432, +0, +1065353216) = (3212836864, 3212836864, 0, 1065353216)
                @store_shared (%16 (0xbf800000, 0xbf800000, 0x0, 0x3f800000), %15 (0x24)) (base=0, wrmask=xyzw, align_mul=256, align_offset=36)
    32x4  %17 = vec4 %8 (0x0), %8 (0x0), %11.z, %11.w
    32    %18 = load_const (0x00000034 = 52)
                @store_shared (%17, %18 (0x34)) (base=0, wrmask=xyzw, align_mul=256, align_offset=52)
    32    %19 = load_const (0x00000044 = 68)
    32x4  %20 = load_const (0x3f800000, 0xbf800000, 0x00000000, 0x3f800000) = (1.000000, -1.000000, 0.000000, 1.000000) = (+1065353216, -1082130432, +0, +1065353216) = (1065353216, 3212836864, 0, 1065353216)
                @store_shared (%20 (0x3f800000, 0xbf800000, 0x0, 0x3f800000), %19 (0x44)) (base=0, wrmask=xyzw, align_mul=256, align_offset=68)
    32x4  %21 = vec4 %11.x, %8 (0x0), %8 (0x0), %11.w
    32    %22 = load_const (0x00000054 = 84)
                @store_shared (%21, %22 (0x54)) (base=0, wrmask=xyzw, align_mul=256, align_offset=84)
    32    %23 = load_const (0x00000064 = 100)
    32x3  %24 = load_const (0x00000000, 0x00000001, 0x00000002) = (0.000000, 0.000000, 0.000000)
                @store_shared (%24 (0x0, 0x1, 0x2), %23 (0x64)) (base=0, wrmask=xyz, align_mul=256, align_offset=100)
    32    %25 = @load_task_payload (%13 (0x10)) (base=0, align_mul=256, align_offset=16)
    1     %26 = ieq %25, %8 (0x0)
    32    %27 = load_const (0x00000070 = 112)
    32    %28 = b2b32 %26
                @store_shared (%28, %27 (0x70)) (base=0, wrmask=x, align_mul=256, align_offset=112)
    32    %29 = load_const (0x00000074 = 116)
    32x4  %30 = load_const (0x3f800000, 0x00000000, 0x3f800000, 0x3f800000) = (1.000000, 0.000000, 1.000000, 1.000000) = (1065353216, 0, 1065353216, 1065353216)
                @store_shared (%30 (0x3f800000, 0x0, 0x3f800000, 0x3f800000), %29 (0x74)) (base=0, wrmask=xyzw, align_mul=256, align_offset=116)
                @barrier (execution_scope=WORKGROUP, memory_scope=WORKGROUP, mem_semantics=ACQ|REL, mem_modes=shader_out|shared)
    32    %31 = @load_shared (%3 (0x84)) (base=0, align_mul=256, align_offset=132)
    32    %32 = umin %31, %4 (0x3)
    32    %33 = @load_shared (%5 (0x88)) (base=0, align_mul=256, align_offset=136)
    32    %34 = umin %33, %6 (0x1)
                @set_vertex_and_primitive_count (%32, %34, %0) (stream_id=0)
                // succs: b1
    loop {
        block b1:   // preds: b0 b4
        32    %35 = phi b0: %1, b4: %46
        1     %36 = uge %35, %32
                    // succs: b2 b3
        if %36 {
            block b2:// preds: b1
            break
            // succs: b5
        } else {
            block b3:  // preds: b1, succs: b4
        }
        block b4:   // preds: b3
        32    %37 = ishl %35, %9 (0x5)
        32    %38 = iadd %2 (0x4), %37
        32x4  %39 = @load_shared (%38) (base=0, align_mul=32, align_offset=4)
        32    %40 = iadd %14 (0x14), %37
        32x4  %41 = @load_shared (%40) (base=0, align_mul=32, align_offset=20)
        32    %42 = deref_var &naga_vertex_builtin_outputs[*].field0 (shader_out vec4[3])
        32    %43 = deref_array &(*%42)[%35] (shader_out vec4)  // &naga_vertex_builtin_outputs[*].field0[%35]
                    @store_deref (%43, %39) (wrmask=xyzw, access=none)
        32    %44 = deref_var &#0 (shader_out vec4[3])
        32    %45 = deref_array &(*%44)[%35] (shader_out vec4)  // &#0[%35]
                    @store_deref (%45, %41) (wrmask=xyzw, access=none)
        32    %46 = iadd %35, %6 (0x1)
                    // succs: b1
    }
    block b5:  // preds: b2, succs: b6
    loop {
        block b6:   // preds: b5 b9
        32    %47 = phi b5: %1, b9: %63
        1     %48 = uge %47, %34
                    // succs: b7 b8
        if %48 {
            block b7:// preds: b6
            break
            // succs: b10
        } else {
            block b8:  // preds: b6, succs: b9
        }
        block b9:   // preds: b8
        32    %49 = ishl %47, %9 (0x5)
        32    %50 = iadd %23 (0x64), %49
        32x3  %51 = @load_shared (%50) (base=0, align_mul=32, align_offset=4)
        32    %52 = iadd %27 (0x70), %49
        32    %53 = @load_shared (%52) (base=0, align_mul=32, align_offset=16)
        1     %54 = b2b1 %53
        32    %55 = iadd %29 (0x74), %49
        32x4  %56 = @load_shared (%55) (base=0, align_mul=32, align_offset=20)
        32    %57 = deref_var &naga_primitive_indices_outputs (shader_out uvec3[1])
        32    %58 = deref_array &(*%57)[%47] (shader_out uvec3)  // &naga_primitive_indices_outputs[%47]
                    @store_deref (%58, %51) (wrmask=xyz, access=none)
        32    %59 = deref_var &naga_primitive_builtin_outputs[*].field0 (shader_out bool[1])
        32    %60 = deref_array &(*%59)[%47] (shader_out bool)  // &naga_primitive_builtin_outputs[*].field0[%47]
                    @store_deref (%60, %54) (wrmask=x, access=none)
        32    %61 = deref_var &#1 (shader_out vec4[1])
        32    %62 = deref_array &(*%61)[%47] (shader_out vec4)  // &#1[%47]
                    @store_deref (%62, %56) (wrmask=xyzw, access=none)
        32    %63 = iadd %47, %6 (0x1)
                    // succs: b6
    }
    block b10:  // preds: b7, succs: b11
    block b11:
}

]]>

inner-daemons avatar Nov 27 '25 23:11 inner-daemons

I also have some AMD GPU code from windows that I'll upload at some point but its absolutely unreadable to my eyes.

inner-daemons avatar Nov 27 '25 23:11 inner-daemons

For future reference: I have taken the generated code with debug symbols and put it through SPIRV-Opt, I have put it through SPIRV-Cross and then back through glslc, and both times it still caused a bug in LLVMPipe. Not sure about AMD however.

I have also narrowed it down to just being the mesh shader part.

inner-daemons avatar Nov 28 '25 04:11 inner-daemons

Update: got this to work by modifying the body of main() in the generated GLSL. Therefore, the interface is fine, and its just an issue with something goofy.

inner-daemons avatar Nov 28 '25 04:11 inner-daemons

These comments are spammy if anyone is listening so you don't need to

I have narrowed it down to this little bit of code that breaks stuff even if the values are immediately rewritten:

  bool cull = mesh_output.primitives[gl_LocalInvocationIndex].cull;
  gl_MeshPrimitivesEXT[0].gl_CullPrimitiveEXT = cull;

Notably, it has to be referenced by gl_LocalInvocationIndex to break, even though that should always be 0. It still results in the following error:

(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[1]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[1]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[1]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[0]: Register never used(null)[1]: Register never used(null)[0]: Register never used(null)[0]: Register never usedLLVM ERROR: Cannot emit physreg copy instruction

I'm beginning to believe this really is a mesa bug

inner-daemons avatar Nov 28 '25 04:11 inner-daemons

The following when replacing the body of the generated GLSL makes it terminate with sigsegv:

void main() {
  mesh_output.primitives[0u].cull = !taskPayload.visible;

  SetMeshOutputsEXT(3, 1);

  bool cull = mesh_output.primitives[gl_LocalInvocationIndex].cull;
  gl_MeshPrimitivesEXT[0].gl_CullPrimitiveEXT = cull;

  gl_MeshVerticesEXT[0].gl_Position = vec4(0.0, 1.0, 0.0, 1.0);
  gl_MeshVerticesEXT[1].gl_Position = vec4(-1.0, -1.0, 0.0, 1.0);
  gl_MeshVerticesEXT[2].gl_Position = vec4(1.0, -1.0, 0.0, 1.0);

  _43[0]._m0 = vec4(0.0, 1.0, 0.0, 1.0) * taskPayload.colorMask;
  _43[1]._m0 = vec4(0.0, 0.0, 1.0, 1.0) * taskPayload.colorMask;
  _43[2]._m0 = vec4(1.0, 0.0, 0.0, 1.0) * taskPayload.colorMask;

  gl_PrimitiveTriangleIndicesEXT[0] = uvec3(0u, 1u, 2u);
  _50[0]._m0 = vec4(1.0, 0.0, 1.0, 1.0);
  gl_MeshPrimitivesEXT[0].gl_CullPrimitiveEXT = false;
}

It doesn't terminate when gl_LocalInvocationIndex isn't used to index mesh_output.primitives, even if it is used elsewhere

inner-daemons avatar Nov 28 '25 05:11 inner-daemons

#version 450
#extension GL_EXT_mesh_shader : require
layout(local_size_x = 1, local_size_y = 1, local_size_z = 1) in;
layout(max_vertices = 3, max_primitives = 1, triangles) out;

struct TaskPayload {
  vec4 colorMask;
  bool visible;
};

out _40 { layout(location = 0) vec4 _m0; }
_43[3];

perprimitiveEXT out _47 { layout(location = 1) vec4 _m0; }
_50[1];

taskPayloadSharedEXT TaskPayload taskPayload;

shared bool array[3];

void main() {
  array[0] = !taskPayload.visible;
  bool cull = array[gl_LocalInvocationIndex];
  gl_MeshPrimitivesEXT[0].gl_CullPrimitiveEXT = cull;

  SetMeshOutputsEXT(3, 1);

  gl_MeshVerticesEXT[0].gl_Position = vec4(0.0, 1.0, 0.0, 1.0);
  gl_MeshVerticesEXT[1].gl_Position = vec4(-1.0, -1.0, 0.0, 1.0);
  gl_MeshVerticesEXT[2].gl_Position = vec4(1.0, -1.0, 0.0, 1.0);

  _43[0]._m0 = vec4(0.0, 1.0, 0.0, 1.0) * taskPayload.colorMask;
  _43[1]._m0 = vec4(0.0, 0.0, 1.0, 1.0) * taskPayload.colorMask;
  _43[2]._m0 = vec4(1.0, 0.0, 0.0, 1.0) * taskPayload.colorMask;

  gl_PrimitiveTriangleIndicesEXT[0] = uvec3(0u, 1u, 2u);
  _50[0]._m0 = vec4(1.0, 0.0, 1.0, 1.0);
}

This example is broken but only because of the first 3 lines in main: if you chain the indexing to use 0 it works fine, and if you remove it it works fine.

inner-daemons avatar Nov 28 '25 05:11 inner-daemons

If I change naga's logic to use


        // Current index to copy
        let val_i = if is_primitive {
            self.get_constant_scalar(crate::Literal::U32(0))
        } else {
            let val_i = self.id_gen.next();
            body.push(Instruction::load(u32_type_id, val_i, index_var, None));
            val_i
        };

when deciding which index to copy over, it actually passes all of the tests except for 2: wgpu_gpu::mesh_shader::mesh_pipeline_basic_mesh and wgpu_gpu::mesh_shader::mesh_pipeline_basic_mesh_frag both consistently segfault while nothing else does.

So in conclusion I have no fucking clue what's going on.

inner-daemons avatar Nov 28 '25 05:11 inner-daemons

This shader

#version 450
#extension GL_EXT_mesh_shader : require

const vec4[3] positions = {vec4(0., 1.0, 0., 1.0), vec4(-1.0, -1.0, 0., 1.0),
                           vec4(1.0, -1.0, 0., 1.0)};
const vec4[3] colors = {vec4(0., 1., 0., 1.), vec4(0., 0., 1., 1.),
                        vec4(1., 0., 0., 1.)};

// This is an inefficient workgroup size.Ideally the total thread count would be
// a multiple of 64
layout(local_size_x = 1, local_size_y = 1, local_size_z = 1) in;
struct PayloadData {
  vec4 colorMask;
  bool visible;
};
taskPayloadSharedEXT PayloadData payloadData;

out VertexOutput { layout(location = 0) vec4 color; }
vertexOutput[];
layout(location = 1) perprimitiveEXT out PrimitiveOutput { vec4 colorMask; }
primitiveOutput[];

layout(triangles, max_vertices = 3, max_primitives = 1) out;

shared bool array[3];

void main() {
  SetMeshOutputsEXT(3, 1);

  gl_MeshVerticesEXT[0].gl_Position = positions[0];
  gl_MeshVerticesEXT[1].gl_Position = positions[1];
  gl_MeshVerticesEXT[2].gl_Position = positions[2];

  vertexOutput[0].color = colors[0] * payloadData.colorMask;
  vertexOutput[1].color = colors[1] * payloadData.colorMask;
  vertexOutput[2].color = colors[2] * payloadData.colorMask;

  gl_PrimitiveTriangleIndicesEXT[gl_LocalInvocationIndex] = uvec3(0, 1, 2);
  primitiveOutput[0].colorMask = vec4(1.0, 0.0, 1.0, 1.0);

  array[0] = !payloadData.visible;
  bool cull = array[gl_LocalInvocationIndex];
  gl_MeshPrimitivesEXT[0].gl_CullPrimitiveEXT = cull;
}

Which is a copy-paste of the fully functional SPIR-V that works on NVIDIA, AMD, and LLVMPIPE, except for the shared bool array[3]; and

  array[0] = !payloadData.visible;
  bool cull = array[gl_LocalInvocationIndex];
  gl_MeshPrimitivesEXT[0].gl_CullPrimitiveEXT = cull;

fails to run on LLVMPIPE. LLVMPIPE segfaults here.

inner-daemons avatar Nov 28 '25 05:11 inner-daemons

So that I remember: the tests started failing between 4b7ba3adc and 65be73b982dd591201dddf8225767b9d2faff7bd

inner-daemons avatar Nov 28 '25 07:11 inner-daemons