GLSL icon indicating copy to clipboard operation
GLSL copied to clipboard

Does branching on gl_HelperInvocation require the invocation goes inactive?

Open sheredom opened this issue 6 years ago • 9 comments

When I spec'ed the GL_KHR_shader_subgroup specification it was negotiated that helper invocations contribute in fragment shader subgroup ops: In fragment shaders, helper invocations participate in subgroup operations.

I'm digging into some of the corner cases with this, and came across a fun one:

#version 450
#extension GL_KHR_shader_subgroup_ballot: enable

layout(set = 0, binding = 0) buffer foo_layout {
    float in_thing[];
};

layout(set = 0, binding = 0) buffer bar_layout {
    float out_thing[];
};

void main() {
    // Lets assume we have at least one helper invocation...
    uvec4 outer_ballot = subgroupBallot(true);

    if (!gl_HelperInvocation) {
        uvec4 inner_ballot = subgroupBallot(true);
        
        // Are there any requirements that the helper invocation must be inactive here?
        // Or (test-wise) can outer_ballot == inner_ballot?
    }
}

So the question is - are helper invocations required to become inactive when they are involved in control flow?

Note - previously (pre subgroup ops) we had no way to detect whether this could/would happen!

sheredom avatar Oct 30 '18 15:10 sheredom

IMO yes, they become inactive. If they didn't, then shaders wouldn't have a way to reliably combine only the values for non-helper invocations.

jeffbolznv avatar Oct 30 '18 16:10 jeffbolznv

@jeffbolznv I think I agree with you that they should become inactive, but I guess where should we say this?

I'm not sure amending the subgroup extension to define what is essentially core execution behaviour of non-helper invocations is the correct approach, but it could be a stop-gap.

If this should become a clarification to the GLSL specification then this wouldn't just be a Vulkan issue as @pdaniell-nv tagged it 😉

sheredom avatar Oct 31 '18 14:10 sheredom

This sounds like it's really a specific instance of us not having a well-defined execution model that says when invocations become inactive and then active again, diverge and reconverge, etc.. Not a minor fix...

jeffbolznv avatar Oct 31 '18 14:10 jeffbolznv

@jeffbolznv agreed.

sheredom avatar Oct 31 '18 14:10 sheredom

@sheredom if you want to continue working on improving the definition of gl_HelperInvocation it might be best to move that to https://gitlab.khronos.org/GLSL/GLSL.

pdaniell-nv avatar Nov 02 '18 18:11 pdaniell-nv

We discussed this briefly at the end of the 11/14 OpenGL call, but came to no conclusions.

Short of a full-blown formal execution model, It seems like we might want to propose some more explicit language describing semantics of helper invocations.

Assuming we can all agree on semantics, I would suggest defining helper invocations as behaving exactly as non-helper invocations, other than these invocations should have no side effects. For example:

  • Any shader outputs are ignored because the invocation does not have a corresponding "live" fragment.
  • Stores to shader storage buffers, images, and other "global" resources have no effect.
  • Stores performed by shader atomic operations have no effect.

One issue we confronted at NVIDIA early on when we did extensions like NV_shader_buffer_load, is what happens to loads performed by helper invocations. In particular, when using pointers without bounds checking, code like the following is potentially risky.

address = basePointer + stride * gl_FragCoord.y + gl_FragCoord.x; value = *address;

If the application has a memory allocation accommodating a 100x77 rectangle, with exactly 7700 elements, you can end up with a "bad" address because you might have a helper invocation with (x,y) coordinates of (99,77), which would result in an out-of-bounds offset of 7799 and could result in faults. We had considered specifying that loads in helper invocations also don't perform a memory access (maybe returning zero) to avoid faults. But that would break shaders that actually did need to do such loads. So we ultimately decided to have them behave normally. (I think atomics always return the value fetched but do not perform an update.)

We ended up proposing gl_HelperInvocation in part to allow applications to defend against issues introduced from memory accesses in helper invocations:

  • spin loops involving atomics that don't actually lock/unlock
  • loads from bad addresses due to unexpected (x,y)

Well-defined semantics might also turn out to be useful for:

https://gitlab.khronos.org/vulkan/vulkan/merge_requests/2913

where there is a proposed hypothetical extension that effectively demotes "real" fragment shader invocations to helper invocations when "discard" is executed. This is the execution model used by D3D.

One other thing we might want to consider in the execution model is to allow helper invocations to commit suicide if they are no longer needed. For example, if you have code like:

if (!gl_HelperInvocation) discard;
// do other stuff

there is no point in continuing to execute the helper invocations after all "real" invocations have terminated. You can't actually observe whether the helper invocations continue to execute in this case, unless and until "do other stuff" would result in an exception or an infinite loop.

nvpbrown avatar Nov 14 '18 16:11 nvpbrown

@reduz hit this problem with a real world case scenario:

if (subgroupElect()) {
	atomicOr(cluster_render.data[usage_write_offset],usage_write_bit);
}

He wants to avoid the performance penalty of writing from many threads (contention) what is essentially the same value.

However doing this from a fragment shader sometimes causes a helper invocation to run the code, which ignores the atomicOr.

We thought of using:

if( gl_HelperInvocation ) {
  if (subgroupElect()) {
	atomicOr(cluster_render.data[usage_write_offset],usage_write_bit);
  }
}

but according to this ticket this is unresolved / undefined.

So I guess alternative solutions would be:

if( gl_HelperInvocation )
	discard; // May have undesired side effects (such as Hardware Z stuff, penalties on TBDR, etc)

if (subgroupElect()) {
	atomicOr(cluster_render.data[usage_write_offset],usage_write_bit);
}

Or:

while( !subgroupBroadcast( gl_HelperInvocation, id ) )
	++id;

if ( gl_SubgroupInvocationID == id ) {
	atomicOr(cluster_render.data[usage_write_offset],usage_write_bit);
}

Or

id = findLSB( subgroupBallot( gl_HelperInvocation ) );

if ( gl_SubgroupInvocationID == id ) {
	atomicOr(cluster_render.data[usage_write_offset],usage_write_bit);
}

Is one of these approaches the better one?

darksylinc avatar Jan 14 '21 16:01 darksylinc

but according to this ticket this is unresolved / undefined.

I believe this to be a mistake. If we truly intend to make helper invocations "behaving exactly as non-helper invocations" non-helper invocations would behave as intended!

Example:

uvec4 outer_ballot = subgroupBallot(true);
if( myOwnVariable[gl_SubgroupInvocationID] )
{
    if (subgroupElect()) {
        // This literally can't include inactive invocations where myOwnVariable[gl_SubgroupInvocationID] == false
    }

    uvec4 inner_ballot = subgroupBallot(true);
    // inner_ballot can't be == outer_ballot
}

If we populate myOwnVariable[gl_SubgroupInvocationID] = gl_HelperInvocation then it would work; but if we evalute gl_HelperInvocation directly it would not work? That's insane.

darksylinc avatar Jan 14 '21 16:01 darksylinc

This was discussed in the working group today and the behavior is supposed to be defined for this case. Branching on gl_HelperInvocation requires that the invocation become inactive. Any of the solutions above should work for avoiding the problem of accidentally picking a helper invocation (although it looks as though some of the code snippets always pick a helper invocation instead of avoiding them).

The spec gives some details about helper invocations in section 9.17 of the Vulkan 1.2 spec. The (unstated) assumption is that helper invocations behave like normal invocations except where exceptions are made, so if a normal invocation would be inactive then the helper invocation must be inactive. (The converse is not true though, where a normal invocation would be active, a helper invocation may be inactive).

gnl21 avatar Jan 21 '21 16:01 gnl21