Draft: Add GL_EXT_mesh_shader
This is an OpenGL extension forking VK_EXT_mesh_shader to provide OpenGL mesh shader functionality.
Numbers in the spec haven't been allocated, so use fake numbers for now. No header/XML updates in the PR either.
This extension is for the request of nvidium users to add OpenGL mesh shader support to drivers other than NVIDIA GPUs:
- https://github.com/GPUOpen-Drivers/AMD-Gfx-Drivers/issues/4
- https://gitlab.freedesktop.org/mesa/mesa/-/issues/12189
Re discussion on the mesa issue, Khronos does not "approve" new vendor extensions, though we do try and consistency-check them and make sure they're following the extension guidelines before we include them in the extension registry and hand out enum allocations. It's true that GL spec activity is very minimal within Khronos, but vendor and EXT extension development do not have to happen inside Khronos.
So the first thing to ask is whether there is a commitment to implement this on the part of someone actually writing Mesa drivers. There's no point in publishing an extension spec in the registry if nobody has implemented it. Then, how and why does it differ from the NV extension? I see a slight signature change on one of the APIs but haven't tried to review the whole thing. Because of the close relationship between them, there should at least be a section down around the "Interactions" discussing the things that are the same, and those that had to be changed, and why.
Would it be possible to implement the NV extension as it stands today on your target GPUs, and then add a really small extension on top of that to accommodate the changed signature, rather than duplicate so much of that language?
BTW, when promoting an extension we keep the enum values unchanged so long as they are indistinguishable semantically from the point of view of the driver they are passed to. Only if there's a need to behave differently depending on which extension is being used would the enum value need to change.
Re discussion on the mesa issue, Khronos does not "approve" new vendor extensions, though we do try and consistency-check them and make sure they're following the extension guidelines before we include them in the extension registry and hand out enum allocations. It's true that GL spec activity is very minimal within Khronos, but vendor and EXT extension development do not have to happen inside Khronos.
Thanks for the explanation.
So the first thing to ask is whether there is a commitment to implement this on the part of someone actually writing Mesa drivers. There's no point in publishing an extension spec in the registry if nobody has implemented it.
Yeah, I'm going to implement it in mesa if it's accepted.
Then, how and why does it differ from the NV extension? I see a slight signature change on one of the APIs but haven't tried to review the whole thing. Because of the close relationship between them, there should at least be a section down around the "Interactions" discussing the things that are the same, and those that had to be changed, and why.
The difference with NV extension has been listed in the issue Q&A:
- https://github.com/yuq/OpenGL-Registry/blob/topic/mesh-shader/extensions/EXT/EXT_mesh_shader.txt#L1127
- https://github.com/KhronosGroup/GLSL/blob/main/extensions/ext/GLSL_EXT_mesh_shader.txt#L1029
Would it be possible to implement the NV extension as it stands today on your target GPUs, and then add a really small extension on top of that to accommodate the changed signature, rather than duplicate so much of that language?
It's not possible to stack a new extension on NV. Because the NV extension interface (mostly GLSL part) is not suitable for other GPU vendors, that's why Vulkan created VK_EXT_mesh_shader. We can implement NV extension with many ugly workaround in driver, but it will hurt performance:
- https://gitlab.freedesktop.org/mesa/mesa/-/issues/7192#note_1822130
BTW, when promoting an extension we keep the enum values unchanged so long as they are indistinguishable semantically from the point of view of the driver they are passed to. Only if there's a need to behave differently depending on which extension is being used would the enum value need to change.
The runtime API part mostly come from the VK_EXT_mesh_shader to leverage the existing agreement made by different GPU vendors. I can keep the enum value which is same as NV extension and assign a fake value for the new ones.
Hi,
So the first thing to ask is whether there is a commitment to implement this on the part of someone actually writing Mesa drivers.
Yes, there is interest in implementing it in RadeonSI as well as in Zink (which would work on top of the Vulkan EXT_mesh_shader exposed by the underlying Vulkan driver).
Then, how and why does it differ from the NV extension?
there should at least be a section down around the "Interactions" discussing the things that are the same, and those that had to be changed, and why
Would it be possible to implement the NV extension as it stands today on your target GPUs
Same as the Vulkan NV vs. EXT extensions. In a nutshell, the NV extension makes it impossible to implement mesh shaders with reasonable performance on other vendors's HW; and EXT fixes that. Furthermore, EXT is better aligned with D3D12 mesh shaders and therefore benefits developers by providing a more familiar programming model.
If you are interested in the exact details, they have been discussed in the Vulkan EXT_mesh_shader blog post and also on the spec MR here, among other places. This comment in the Mesa repo goes through the main issues with implemeing the NV extension on HW that wasn't designed for it.
I'd like to see the mesa implementation at least well underway before this is released, but I think it's a great addition to the ecosystem. Bringing cross-vendor support to GL mesh shading will enable things like nvidium to finally run on more platforms.
Then can the numbers be allocated first, so that I can update headers and start implementation?
Yeah that seems good. @oddhack do you take care of that or am I supposed to do something?
Then can the numbers be allocated first, so that I can update headers and start implementation?
@yuq how many do you need? We allocate in blocks of 16. I think I counted 62 enums in the spec as it stands, so I can give you a block of 64 if you need that many (the new bit values not included in that total since they are semantically in a different namespace).
I need 22, aligned to 16 is 32. I reused some enum numbers from GL_NV_mesh_shader, only those begin with 0xF need to be allocated.
How about the extension serial number (I fake to 1024)? Do they have to be allocated when release?
I need 22, aligned to 16 is 32. I reused some enum numbers from GL_NV_mesh_shader, only those begin with 0xF need to be allocated.
How about the extension serial number (I fake to 1024)? Do they have to be allocated when release?
Done, see https://github.com/KhronosGroup/OpenGL-Registry/commit/d8fdb8d9e236f2a9c3c0d614773c940942550e3c (enums 0x9740-0x975F).
The extension number is assigned when we publish. It isn't actually used as anything but an ordering mechanism.
OK, thanks.
We are 100% in support of this proposal and are excited about working with this functionality.
I've done the implementation in mesa for AMD GPU. Next I'm going to upstream the code while giving it more test. https://gitlab.freedesktop.org/yuq825/mesa/-/commits/topic/mesh-shader
Could this MR be merged now? @oddhack @zmike
@yuq WG has some review comments pending. Also will wait to ship until nvidium is at least semi-working with this (pending) to ensure things are usable as expected.
@yuq WG has some review comments pending. Also will wait to ship until nvidium is at least semi-working with this (pending) to ensure things are usable as expected.
I think nvdium only works on nvidia cards? Switching from nv's mesh shader implmentation to this is enough to make it run on amd cards?
Are there any guarantees about how/where the mesh shader workgroups are launched? Know it would be hardware dependent but as an example, launching many mesh tasks from a one task shader and few meshes from other task shaders perform significantly worse than a roughly even distribution of mesh tasks? Do have a project would be interested to implement a renderer using this extension, main thing is translucency ordering think?
Are there also any guarantees about how work is dispatched from the mesh shader to the remaining raster pipeline? That is, if a single mesh shader workgroup from a set of mesh shader workgroups (dispatched from a task shader) takes significantly longer to complete, would this block the other workgroups from dispatching work to the raster pipeline? (asking due to possibility of doing compute raster in the mesh shader while still dispatching larger tris to the hw raster pipeline, which would also enable higher gpu sillicon utilization and hopefully maximize full throughput)
Guess tho this is also completely implmentation dependent and
WG has some review comments pending. Also will wait to ship until nvidium is at least semi-working with this (pending) to ensure things are usable as expected.
I think nvdium only works on nvidia cards? Switching from nv's mesh shader implmentation to this is enough to make it run on amd cards?
@Headcrabed As far as I see the author of nvidium @MCRcortex is here with us, so I will interpret his presence as being interested in porting nvidium to use the EXT mesh shaders instead of the NV extension. Assuming no other NVidia specifics are used by nvidium, it should then work on other GPUs too.
I'm not that familiar with radeonsi internals, so maybe I am missing something obvious like a GFX10+ feature used in the code, but I am curious: can you shed some info on what is preventing this from being used on GFX9 as well (code has a requirement of >=GFX10_3)?
@Ristovski Previous GPUs don't have the hardware capability to implement mesh shaders. Most notably, only GFX10.3 and newer support per-primitive outputs.
Are there any guarantees about how/where the mesh shader workgroups are launched?
@MCRcortex Not sure if this thread is the right one to discuss GPU specific implementation details, but I'm happy to answer your questions about how it works at least on AMD HW. I reached out to you on your Discord.
With regards to rasterization order: it seems that both AMD and NVidia do guarantee the "strict" rasterization order in their currently released GPUs. I haven't got any info about other hardware vendors yet.
Considering that the Vulkan spec is different from the proposal as well as different from the D3D12 spec, I asked for clarification on the Vulkan spec to make sure whether the current spec is what was intended. For consistency between OpenGL and Vulkan, I suggest to wait until that is resolved before we move forward here.
Am doing some work on mesh shaders now and had some other questions, would it be possible to get a gl_DrawID as described in ARB_shader_draw_parameters for multidraw commands? it's possible to get the "inner id" (semi equivalent to gl_InstanceID) with gl_WorkGroupID, NV_mesh_shader has a somewhat equivalence of this by having a 'first' argument in the draw commands
quote
The x component of gl_WorkGroupID of the first active stage will be within the range of [<first> , <first + count - 1>]
am doubtful of being able to get a first argument but would it be possible to have gl_DrawID work as a replacement (as is already described in ARB_shader_draw_parameters)?
would it be possible to get a
gl_DrawIDas described inARB_shader_draw_parametersfor multidraw commands?
Yes. GLSL spec described it: https://github.com/KhronosGroup/GLSL/blob/main/extensions/ext/GLSL_EXT_mesh_shader.txt#L964
Oh awsome, ty, missed that in the spec
Have a project that should be more easily to test this extension with (unfortunately am missing hardware todo so (dont have any amd hardware))
How does this interact with ARB_separate_shader_objects? There are no explicit notes in the spec here for it.
How does this interact with ARB_separate_shader_objects? There are no explicit notes in the spec here for it.
ARB_separate_shader_objects is added to OpenGL 4.1, this extension is written on OpenGL 4.6 spec, so no explicit notes on ARB_separate_shader_objects interaction. There's some notes to add mesh shader stages for ARB_separate_shader_objects introduced spec words:
- https://github.com/KhronosGroup/OpenGL-Registry/blob/95b430effdf13555bbbea2598dc83d5eca9fa740/extensions/EXT/EXT_mesh_shader.txt#L196
- https://github.com/KhronosGroup/OpenGL-Registry/blob/95b430effdf13555bbbea2598dc83d5eca9fa740/extensions/EXT/EXT_mesh_shader.txt#L375
- https://github.com/KhronosGroup/OpenGL-Registry/blob/95b430effdf13555bbbea2598dc83d5eca9fa740/extensions/EXT/EXT_mesh_shader.txt#L390
- https://github.com/KhronosGroup/OpenGL-Registry/blob/95b430effdf13555bbbea2598dc83d5eca9fa740/extensions/EXT/EXT_mesh_shader.txt#L390
Ah, thanks, I missed that.
Talked a bit about this with @zmike, however it should also be raised here, how does mesh shaders interact with GL_RASTERIZER_DISCARD in my opinion it should be ignored as it does not make much sense to have for mesh shaders, however do know there is a spec clarrification issue for this for the vulkan extension. It might be good to either wait on the clarification or make a decision so later revisions to the spec arnt needed (for this issue)
how does mesh shaders interact with GL_RASTERIZER_DISCARD in my opinion it should be ignored as it does not make much sense to have for mesh shaders
The way I understand it, it should work the same as for any other pre-rasterization stage. Why should it be ignored?
EDIT: I'd like it to be consistent with Vulkan. Does Vulkan ignore it?
its not defined in the vulkan spec from what do understand (hence zmike requesting clarification). personally imo it should be ignored as it should be equivalient to mesh shaders emitting size of zero. if rasterization discard is not included it could be an optimization possibility for drivers. transformer feedback is not applied either, however reading the specification again it does suggest that rasterization discard is applied
After programmable mesh processing, the same fixed-function operations are applied to vertices of the resulting primitives as above, except the transform feedback (see section 13.3), primitive queries (see section 13.4) and transform feedback overflow queries (see section 13.5) are replaced by mesh primitive queries (see section 13.Y).
furthurmore, if rasterization discard is enabled a fragment shader is not strictly required to even be attached (from what have read, this is probably incorrect however)
With Vulkan the rasterization discard state still applies when drawing with mesh shaders.