GLSL
GLSL copied to clipboard
Why is `gl_SubgroupSize` not a constant?
Why is gl_SubgroupSize
not a constant?
Per Vulkan spec it is PhysicalDevice property, so it feels it could be a built-in constant.
Currently it seems to be listed under built-in variables.
I am also not certain why gl_NumSubgroups
is only "uniform across the invocation group". Why would it ever change within the workgroup?
Variables are not considered constant unless their value is known when the front-end compiler translates the GLSL to SPIR-V. (There are also specialization constants for SPIR-V, which can change their value when the SPIR-V is given to the driver.)
If something could vary across pipelines, or platforms, etc., it cannot be a front-end constant.
That's weird. Almost all built-in constants are "Implementation-dependent".
gl_SubgroupSize
is practically the same. It is a physical device limit too.
It is declared as mediump in uint gl_SubgroupSize;
allowing it to vary more than a constant. However, I also see in the issue section that it was resolved to be a property of just the device, making it like other constants.
@sheredom, anything else to add?
Nope, I think you've covered it all @johnkslang.
In theory it could be a constant as long as the value is only defined when the IHV driver consumes the SPIR-V, in practice I'm not sure it matters much?
I am also not certain why gl_NumSubgroups is only "uniform across the invocation group". Why would it ever change within the workgroup?
You are probably right here, probably a copy/pasta issue. I'll take a look.
I'm not sure it matters much?
Front ends (and GLSL) can do more with a const
than they can with an in
, but I don't know if this specific one benefits from being a constant.
On further internal discussion, it is better to keep this as an in, at least for Vukan, because we don't want to make the ISV change the .conf
file per target platform.
So, the parts of the extension specification that say it is constant should be corrected to be consistent with being an in.
It is bit unfortunate. That would disqualify its use in e.g. subgroupBroadcast()
. Shouldn't something like subgroupBroadcast(x, gl_SubgroupSize - 1)
be allowed?
I see the value in that, but suspect it should be achieved by a different change.
because we don't want to make the ISV change the
.conf
file per target platform.
This is also identical problem as for the other physical device limits, no? The default config should simply be const gl_SubgroupSize = 1
. The user provides the conf
to the compiler to check if his code is correct — and the default config is supposed to guarantee compatibility with all drivers.
PS: Hmm, either way it has to stay a variable until incompatible change can be made... drivers wouldn't know how to populate it with value if it was changed to OpSpecConstant
.
This is also identical problem as for the other physical device limits, no?
Technically, but in practice there is a different emphasis on knowing this compared to limits: This one is the actual value in play, versus a min/max limit.
Why is
gl_SubgroupSize
not a constant? Per Vulkan spec it is PhysicalDevice property, so it feels it could be a built-in constant. Currently it seems to be listed under built-in variables.I am also not certain why
gl_NumSubgroups
is only "uniform across the invocation group". Why would it ever change within the workgroup?
It was originally specified this way due to the way Intel GPUs handle SIMD groups. SIMD groups can be 8, 16, or 32. For vertex processing stages and compute, the SIMD group size is fixed during shader compilation. For fragment stage, the SIMD group size will vary across the primitive, up to a limit based on register pressure, depending on how the hardware decided to dispatch the groups.
I think we may have implemented these variables in a way that could allow some of these variables to be const (instead of uniform) or uniform (instead of in). I need to double check.
As noted above as well as over here ( https://github.com/KhronosGroup/glslang/issues/1591#issuecomment-442121566 ), not treating gl_SubgroupSize
as a constant is going to be painful for developers.
If a device doesn't support shader_subgroup_shuffle
then there isn't a simple replacement for subgroupBroadcast(gl_SubgroupSize-N)
.
Why is
gl_SubgroupSize
not a constant? Per Vulkan spec it is PhysicalDevice property, so it feels it could be a built-in constant. Currently it seems to be listed under built-in variables. I am also not certain whygl_NumSubgroups
is only "uniform across the invocation group". Why would it ever change within the workgroup?It was originally specified this way due to the way Intel GPUs handle SIMD groups. SIMD groups can be 8, 16, or 32. For vertex processing stages and compute, the SIMD group size is fixed during shader compilation. For fragment stage, the SIMD group size will vary across the primitive, up to a limit based on register pressure, depending on how the hardware decided to dispatch the groups.
I think we may have implemented these variables in a way that could allow some of these variables to be const (instead of uniform) or uniform (instead of in). I need to double check.
What we ended up implementing was always set gl_SubgroupSize to either the compile-time computed value (for vertex, compute, etc.) or 32 (the maximum possible value for fragment stage), and mark the other channels as inactive when executing in a narrower SIMD mode. I believe it should be safe for us to change gl_SubgroupSize to const.
Given that subgroupProperties.subgroupSize
on the vulkan side gives us only one number, is there any way for the end user to distinguish when it is not constant?
I am implementing a stream compaction phase for my path tracer using subgroups and tried adding the following line:
shared float sumdata[gl_SubgroupSize];
but as a workaround now have to do:
layout (constant_id = 0) const int userProvidedSubgroupSize = 64;
shared float sumdata[userProvidedSubgroupSize];
and on the vulkan side, I then use VkSpecializationInfo
to always sret it to subgroupProperties.subgroupSize
which seems a bit counter-intuitive given
its always constant for a given Device
Vulkan has VK_PIPELINE_SHADER_STAGE_CREATE_ALLOW_VARYING_SUBGROUP_SIZE_BIT_EXT
which says whether or not the SPIR-V SubgroupSize
can vary.
If memory serves me well, Intel can configure your shader to run in SIMD4,8, and 16 modes.
I feel there are two issues here:
- The reason @krOoze created this ticket, which can be traced to his comments in the Vulkan tutorial. In there he seems to believe that if gl_SubgroupSize were to be constant, then the following would be true:
subgroupBallotFindMSB( subgroupBallot(true) ) == gl_SubgroupSize - 1;
subgroupMax(global_index) == subgroupBroadcast(global_index, highestActiveID);
Unfortunately that's wrong. The reason those equalities are wrong is not because gl_SubgroupSize isn't constant, but because the driver is (as per spec) allowed to issue a dispatch of 16x1x1 threads to be processed by 16 different waves with 63 inactive waves and 1 active wave instead of 1 wave with 16 active threads and 48 inactive ones. It would be extremely inefficient, but it is allowed.
Although extremely likely, we don't have a guarantee that threads [0; gl_WorkGroupSize) will correspond 1:1 to [0; gl_SubgroupSize).
In fact the GL Mesa driver when operating in Wave32 on RDNA currently sets gl_SubgroupSize = 64 and always returns the last 32 bits as inactive threads. This is bizarre, but allowed as per specs.
If you want the strong guarantee you're looking for, then use VK_EXT_subgroup_size_control
to force gl_SubgroupSize to a specific value and VK_PIPELINE_SHADER_STAGE_CREATE_REQUIRE_FULL_SUBGROUPS_BIT_EXT
to ensure 1:1 [0; gl_WorkGroupSize) -> [0; gl_SubgroupSize). Someone correct me if I'm mistaken.
This is a desirable property because very clever optimizations can be done if we can assume gl_LocalInvocationID == gl_SubGroupInvocation with no inactive threads.
In fact if I'm wrong and EXT_subgroup_size_control
+ PIPELINE_SHADER_STAGE_CREATE_REQUIRE_FULL_SUBGROUPS_BIT_EXT
is not enough for that, then another ticket should be created.
- The issue that gl_SubgroupSize isn't constant. ~It is already uniform. This is problematic because uniform is meant to be the same value over all the workgroups. So OP is right that if gl_SubgroupSize is uniform, then it should be a constant~. Update:
GL_ARB_shader_ballot
declares it as uniform, but GL_KHR_shader_subgroup does not, although question 6 in the FAQ is confusing:6. Should gl_SubgroupSize be allowed to vary (for example across shader stages)? RESOLUTION: NO (...)
.
However drivers and implementations don't want this to be the same value. They're treating it like a special case where it can vary per workgroup. If GLSL were to adopt my https://github.com/KhronosGroup/GLSL/issues/81 proposal the correct decoration would be either threadgroup_uniform
or simd_uniform
.
My recommendation is that this issue will only be resolved until #81 is adopted and then it gets decided whether gl_SubgroupSize should be threadgroup_uniform
or simd_uniform
.
VK_PIPELINE_SHADER_STAGE_CREATE_ALLOW_VARYING_SUBGROUP_SIZE_BIT_EXT
is indeed problematic here, which makes it "only" a subgroup-scope uniform. It quickly gets messy to express all the levels of conditional constness...
PS: Maybe the original issue should be closed. The window in which it could have been fixed trivially is gone. Now it is settled and extensions are made on top of those foundations. It was mostly matter of purism for me, but as we see the constness situation only got more complicated and exceed the humble concern of the original Issue.
Notably also, I think subgroupBroadcast()
was relaxed in higher versions of SPIR-V.