gpuweb Uniformity analysis is far too strict

Now that uniformity analysis is being treated as an error instead of a warning, many of the shaders we've been working with are being flagged as uniformity analysis errors. Modifying the shaders to pass analysis is a monumental task, with an existing engine, where we translate existing shaders from an enormous library, where shaders use libraries of includes, or are otherwise composited from multiple sources. We cannot just "rewrite the shader" to pass the analysis check.

I believe the analysis is being far too strict, particularly for non-vertex shaders. One example of an analysis error in a fragment shader:

let cutout = textureSample(cutoutTex, sampler, uv);
if (cutout.a > 0.5) discard;
// ... lots of code for the rest of the shader code
let color = textureSample(colorTex, sampler, uv);

Any textureSample calls after that conditional discard line, is considered non-uniform.

No other graphics API or platform has a problem with this pattern. But now WebGPU will error on it.

This is just one example, but other errors I've been hitting are of a similar pattern. We need to loosen up the analysis and what is flagged as an error. As it is, I'm not confident we can reasonably port our engine to WebGPU.

Sep 28 '22 18:09 brendan-duncan

The alpha-discard example you showed should be fixed (at least in the spec) because discard is not demote_to_helper semantic instead of terminate. Can you share some of the other examples that were not fixed by the change in discard semantic?

Sep 29 '22 13:09 Kangz

Thanks. Yes, I'm in the process of putting together a catalog of all the types of errors we're getting. I'll share that here soon.

Sep 29 '22 16:09 brendan-duncan

In general, we know that the analysis is going to need various adjustments to accommodate common patterns that the analysis isn't smart enough to recognize are fine. But this has to happen on a case-by-case basis, so posting about the problems you run into is very helpful. Obviously, the details matter, but our hope is that in the end, mass conversions should only require a few adjustments --- assuming the code is actually correct.

Sep 29 '22 23:09 jimblandy

I'm still looking through the 1,865 shader variants being flagged with errors from a single project I've been testing, to identify the root causes of the analysis errors. So far, many of them are caused by the aforementioned discard pattern, and global variable false positive errors. A local variable being set to a non-uniform value, and then set to a uniform value, will still be considered non-uniform. At least for us, it's possible these could possibly be resolved in the shader translation process, since the global variables are being introduced from the translation of SPIR-V to WGSL by Tint. The "every global var is non-uniform" rule is generally problematic, though I understand the increased scanning complexity relaxing that rule would impose. I would argue the problems it causes, has a greater cost than the safety it promises.

Sep 30 '22 17:09 brendan-duncan

@brendan-duncan I just realized my comment above was a bit dry. I thought Tint had the uniformity of discard up to spec but that seems to not be the case so I can definitely imagine that it would be hard to filter out false negative from true negative. Let's collaborate a bit more on making Tint better so we can see what are the problematic cases that aren't just the ecosystem missing features. (which I'm sure there will be since you showed one issue earlier where there was a sample in a condition depending on the value of a sample iirc?).

Oct 03 '22 13:10 Kangz

As it is, I'm not confident we can reasonably port our engine to WebGPU.

I would consider that an utter failure of WebGPU.

I would argue the problems it causes, has a greater cost than the safety it promises.

This was understood from the beginning as a potential risk, since no other shader language or tooling enforced this level of strictness. We spent a lot of innovation tokens on this, and it's the arguably the densest part of the spec. Your feedback is tremendously valuable.

There are a few issues at play:

Patterns that are actually pedantically correct but the compiler can't analyze:
- e.g. branch on a value that the programmer knows is uniform, but the compiler can't tell. (e.g. value from storage buffer)
- Programmer knows better, pedantically
Patterns that are pedantically incorrect, but any inaccuracy or non-portability is within application tolerance
- e.g. the derivative is not computed perfectly, so you might get select the wrong level of detail on a texturing operation. But perhaps the error has small perceptual impact. (Does the game player notice an artifact at the edge of an object at 30fps?)
- Programmer knows better, accepts the tradeoff
Conservative approximations in the analysis.
- e.g. uniform value stored to a module-scope variable, then used in another function. The analysis is conservative because it is static analysis and only intra-procedural.
Translation mechanism hides the uniformity: e.g. Tint's SPIR-V reader copies pipeline inputs to module-scope variables for later use, which then triggers the loss of accuracy due to conservatism about module-scope variables.
- This has material effect certainly for the workgroup_id and num_workgroups builtin inputs, which are by definition unifoirm. No other builtins are considered uniform, and so fragment shaders are unaffected by this.
- This aspect can be fixed with a more sophisticated translation.

The "programmer knows best" scenarios (1 and 2) are legitimate and can't be fixed by smarter WebGPU implementations.

To me this is very strong evidence that we need an escape hatch of some sort.
Usability matters a lot. That escape hatch should be usable from Unity's use case: starting with something like HLSL through a chain of translations into SPIR-V then WGSL. The chain-of-translations flow represents a very important class of applications, definitely not only Unity.

Oct 06 '22 14:10 dneto0

Another example of a shader that falls in the "yes, this is non-uniform, but we know what we're doing" category is from one of our UI blit shaders (example in HLSL, but it looks similar in WGSL):

fixed4 frag(v2f i) : SV_Target
{
    fixed4 color = fixed4(1, 1, 1, 1);
    switch (i.texcoord.z)
    {
        case 0: color = tex2D(_MainTex0, i.texcoord.xy); break;
        case 1: color = tex2D(_MainTex1, i.texcoord.xy); break;
        case 2: color = tex2D(_MainTex2, i.texcoord.xy); break;
        //case 3 .... 8:
    }
}

Oct 06 '22 18:10 brendan-duncan

Another problem is that in other shading languages, implicit derivatives are allowed if control flow is uniform among all fragments for a single primitive but WGSL requires uniformity across the entire draw call. So something like this is totally allowed in GLSL, but not in WGSL:

vec4 color = vec4(1,0,0,1);
if (gl_InstanceID == 0)
    color = texture(sampler2D(tex, linear), texcoord.xy);

(In a more real-world example, you might use gl_InstanceID to index into a uniform array and then branch on that value instead.)

Oct 06 '22 19:10 fintelia

This is likely to be an issue in piet-gpu as well. I'm linking a gist of a mostly automated translation of coarse.comp, not intended to be correct but intended to highlight the uniformity issue. It fails with the following error:

coarse.wgsl:1610:5 error: 'workgroupBarrier' must only be called from uniform control flow
    workgroupBarrier();
    ^^^^^^^^^^^^^^^^

coarse.wgsl:2136:5 note: control flow depends on non-uniform value
    if (((x_2196 >= x_2197) & (x_2199 >= x_2200))) {
    ^^

coarse.wgsl:1392:28 note: reading from workgroup storage variable 'sh_part_count' may result in a non-uniform value
        let x_1270 : u32 = sh_part_count[255i];

This is basically another manifestation of #2321. I'm trying to broadcast a value from a shared memory location to all threads. This is uniform as long as there's no race condition, so basically uniformity analysis depends on proving the absence of races. It is possible to work around this but with some effort and loss of performance. This idiom helps load-balance work across threads.

(For completeness, repro steps for generating that gist: coarse.comp is compiled to coarse.spv using glslangValidator (spv is in gen/ subdir of piet-gpu repo). A few constructs hand-patched out of coarse.comp, mostly atomics, but also an instance of findLSB, which fails with error: unhandled GLSL.std.450 instruction 73 even though firstTrailingBit is available. Rename ref and mat as those are reserved words. Run tip-of-tree tint over resulting spir-v. Remove @stride(4). Fix a couple type mismatches caused by multiple structs being structurally equal.)

Oct 07 '22 02:10 raphlinus

Related #2321 which discussed an escape hatch for marking a load from a storage buffer as uniform.

Ah, added that before I saw/read @raphlinus comment.... :-) I was triaging something else.

Oct 07 '22 17:10 dneto0

It's going to be tricky to design a targeted escape hatch that can be used by translation chains, because the upstream languages won't carry any information that could indicate when the escape hatch should be applied.

For example, annotations on expressions or struct members aren't something a translator could produce from HLSL, because there's no way for the author of the HLSL to indicate where to insert them.

Oct 07 '22 17:10 jimblandy

It's going to be tricky to design a targeted escape hatch that can be used by translation chains, because the upstream languages won't carry any information that could indicate when the escape hatch should be applied.

Agreed.

I think we have distinct audiences:

Class A: Folks who have a long chain of translations from other original source. Unity, for example.
Class B: Folks for whom WGSL is the original source. Includes a long tail of developers making a Web-first GPU app.

These are clearly different use cases:

Class A folks often do their own qualification/certification/QA and tells their downstream users (or internal teams) what works where. * They take on their own responsibility for correctness for the uniformity cases. * They need a simple big hammer of an escape hatch, single knob to turn on.
Class B folks are the ones who need the guidance and guardrails that the uniformity analysis is supposed to help. They: * (1) learn from the analysis feedback, and * (2a) have a good chance at fixing their code if it truly is a mistake or * (2b) applying a targeted escape to their WGSL source when they truly know better.

So basically, it looks like two kinds of escape hatches for the two classes of audience.

If we agree on that, then the rest is spelling.

Oct 07 '22 19:10 dneto0

Yes, that's very much in line with my thoughts. I don't think we should be afraid of a bulk opt-out flag, because we intend to make the uniformity analysis valuable: people who can leave the checks on will want to do so. If everyone just flips the bulk switch, then that shows that we failed to provide feedback that they found helpful.

(By analogy: Rust devs don't simply slap unsafe on everything - but that is because safe Rust is quite usable.)

I'm a little concerned that making it useful for your second category of users may require a good bit of further work on the standard. It may not be ergonomic to simply provide (say) an expression form that coerces a value to uniform in the eyes of the analysis. We may instead need attributes on variables, struct members, and (considering the example shown earlier where the author "knows" that some vector's z value is uniform across primitives) vector components, to avoid forcing people to repetitively annotate each use of a value that they know is uniform. That is "spelling" - but it's a lot of it.

Oct 08 '22 10:10 jimblandy

I took a closer look at @raphlinus' example.

There's a function-scope variable ready_ix that triggers the uniformity violation. The loop exit depends on ready_ix, near line 2135:

    let x_2197 : u32 = ready_ix;   //  Propagated into x_2197
    let x_2199 : u32 = partition_ix;
    let x_2200 : u32 = n_partitions;
    if (((x_2196 >= x_2197) & (x_2199 >= x_2200))) {   // and exit condition depends on x_2197
      break;
    }

And ready_ix is updated in exactly one place inside the loop:

        workgroupBarrier();
        let x_1270 : u32 = sh_part_count[255i];
        ready_ix = x_1270;

But it occurs immediately after a workgroupBarrier(). Then I noticed that Raph's algorithm appears to split execution into alternating phases that write to sh_part_count and phases that read from sh_part_count, and where the phases are separated by workgroupBarrier. So yes, there is no race on sh_part_count, and when reading, all invocations see the same values in the variable. This is firmly in the "programmer knows best" camp, confirming what @raphlinus wrote.

Oct 08 '22 19:10 dneto0

Yes, that's very much in line with my thoughts.

Ok, so some kind of big switch.

I'm a little concerned that making it useful for your second category of users may require a good bit of further work on the standard. It may not be ergonomic to simply provide (say) an expression form that coerces a value to uniform in the eyes of the analysis. We may instead need attributes on variables, struct members, and (considering the example shown earlier where the author "knows" that some vector's z value is uniform across primitives) vector components, to avoid forcing people to repetitively annotate each use of a value that they know is uniform. That is "spelling" - but it's a lot of it.

Let's make this concrete:

Add a uniform builtin:
- Signature: uniform(e:T) -> T where T is any type that
- Semantics: It's the identity function, but also an assertion that the result value is uniform. In terms of the analysis: in the uniformity analysis graph no arc leaves the uniformity. i.e. it cuts a path.
Add a @uniform attribute, to that can e applied to var, let and formal parameter declarations. It's an assertion that any value read from them is uniform. Cuts the path.

FYI: SPIR-V has Uniform and UniformId decorations, which are like the uniform builtin, i.e. applied to expressions. The "Id" part is for expressing a scope other than 'subgroup'; by default 'Uniform' asserts 'this value is uniform across the subgroup'.

Oct 08 '22 19:10 dneto0

I just want to second the point that a shader author needs to have an option of "I know what I am doing" in that one can declare that use of a derivative (or implicit derivative) non-uniform control are ok and handled. One idea is perhaps to have variants of those function, dFdx, dFdy, fwidth, texture, to have a variant that can be used in a non-uniform situation, maybe something like dFdxNonUniform.

One example where there is non-uniform control, but the results for derivative functions give good values is the following (excuse the use of GLSL)

flat in int I;
in vec2 f;

uniform sampler2D S;
out vec4 out_color;

void main(void)
{
   if (I == 0)
    {
       vec2 ff;

      ff = sin(f) * 0.5 + 0.5; // or any other non-trivial formula
      out_color = texture(S, ff);
    }
   else
   {
       out_color = vec4(1.0, 1.0, 1.0, 1.0);
   }
}

The derivative operations will give correct values, because the if() is from a flat value which is the same across all fragments in a triangle.

Oct 11 '22 14:10 krogovin

The derivative operations will give correct values, because the if() is from a flat value which is the same across all fragments in a triangle.

The example seems reasonable, and resembles the one given in Brendan's earlier comment.

At the moment, the only scope of uniformity that WGSL discusses is the entire draw call. I wonder if it would make sense to have two scopes:

the draw call, for barriers, and
the primitive, for derivative operations.

Since barriers can only be used in compute shaders, and derivative operations can only be used in fragment shaders, WGSL could simply assume the appropriate scope for each one. A user-defined function that uses barriers or derivatives is already restricted to use by one kind of entry point or the other, so we should always be able to select a single kind of uniformity to apply for each function.

Oct 11 '22 15:10 jimblandy

There are a few bits about uniform control flow and it is worth spelling out fully. The main point was for SIMD architecture so that if a value depended on another slot with an SIMD jazz, then the value was there; another one that is a bit rarer is that the value is the same for all buggers across the SIMD jazz (the one I remember the most was for sampler2D something[N] so that when indexing into something, the array index was the same across the entire SIMD jazz).

In terms of kinds of uniformity there are more than the two above, here are some that I see for drawing stuff:

same across the entire draw
same across an entire instance of a draw
same across an entire patch (after vertex shading)
same across an entire triangle (in fragment shading)

When we get into compute there are more naturally coming from work group geometry the most obvious one being same across entire dispatch vs same across just a fixed work group... but for compute there are more if one things about each of dimensions to some extent.

Though, what is more ideal is to somehow "know" the SIMD-ness (or SIMT-ness) of stuff; however, that is terribly GPU specific. Sighs.

Oct 11 '22 16:10 krogovin

WGSL 2022-10-11 Minutes

JB: Want to extract as much information from folks now about what they're hitting because as soon as we have an escape hatch we won't get the feedback. Would be good to organize issue to have a list of issues being encountered instead of just discussions.

Oct 13 '22 23:10 kdashg

I'd advise for have a "strict/safe" and "liberal/unsafe" versions for those functions that have derivatives.

On a related note, here is a real world example of using early out like semantics in a way compatible with derivative ops:

bool
do_early_out(void)
{
   float f;

   f = some_smooth_enough_function_dependent_on_smooth_varyings();
   return f + fwidth(f) < 0.0
}

out vec4 v;

void
main(void)
{
    if (do_early_out())
    {
         v = vec4(0.0);
         return;
    }

    v = some_expensive_stuff();
}

and then there is this which is even better:

bool
do_early_out(void)
{
   float f;

   f = some_smooth_enough_function_dependent_on_smooth_varyings();
   return f  < 0.0
}

out vec4 v;

void
main(void)
{
    if (allInvocationsARB(do_early_out()))
    {
         v = vec4(0.0);
         return;
    }

    v = some_expensive_stuff();
}

Oct 14 '22 07:10 krogovin

What information are you looking for? I have literally thousands of shader variants with errors that could be provided to help catalog some of the analyses issues. Most of the shader errors come from common patterns that are identified as non-uniform, but I'd rather not spend a lot more time sorting through the shaders to catalog the error sources. The shaders are quite large, and translated through multiple compilers before they get to WGSL, so they are also not very readable.

A number of the fundamental issues of analysis being too strict have been discussed. There are two main categories of analysis issues: false errors due to scanner implementations, and shaders that are purposefully non-uniform.

The first category can possibly be fixed by the implementations, but the requirement that analysis scanning be in linear time and performant makes it difficult.

The second category needs an escape hatch. The escape hatch needs to be at both the global level, for shaders like in Unity that are not written by hand but go through multiple levels of translation; and ideally at the shader code level, so a shader author could tag a block or variable as being acceptably non-uniform. The global level escape hatch is a must-have for Unity, since we don't author WGSL directly.

Oct 14 '22 23:10 brendan-duncan

There's also the third case mentioned above of fragment shaders which are non-uniform according to WGSL's definition, but valid according to the requirements of the low-level APIs.

But really the more high level concern is about the false positive rate. If users are constantly getting non-uniformity errors for valid code, they're just going to start habitually replacing if(...) {} with if(uniform(...)) {} whenever they see one (or flipping the global escape hatch if one is provided) without even stopping to think whether their shader is correct or not. At which point the uniformity analysis isn't actually providing any value at all...

Oct 17 '22 05:10 fintelia

they're just going to start habitually replacing if(...) {} with if(uniform(...)) {} whenever they see one (or flipping the global escape hatch if one is provided) without even stopping to think whether their shader is correct or not. At which point the uniformity analysis isn't actually providing any value at all...

I disagree with that sentiment; Most rust code is not massively decorated with unsafe just to get it to compile. I also don't think an interface of if(uniform()) is anywhere the right way to go, as declaring that a block of code is uniform when it is not has futher consequences than just if it is legal to use certain functions. I'd rather see "unsafe" versions for the functions dFdx, dFdy, fwidth, texture and barrier that can be used regardless of uniform analysis. That would be the escape hatch for "I know what I am doing". For the case where there are shaders that originated from other sources (such as Unity) that are translated to WGSL, then they would just use the unsafe versions in their translation since those shaders were already working anyways. Though that option requires some kind of option in the translator to do that. I admit I am quite uneasy about a global option to turn it all off in all of a shader, it seems gong to far... though, maybe I am wrong here on that.

Oct 17 '22 09:10 krogovin

For developers like us who don't write WGSL directly, marking WGSL blocks as "unsafe" or "uniform" isn't really an option. Unsafe versions of the offending functions, might work in that I could just filter/replace the safe versions with unsafe versions when generating the WGSL if it's determined the shader is intentionally non-uniform or hit by false errors. Otherwise a global option would be needed in our case.

Oct 17 '22 18:10 brendan-duncan

I disagree with that sentiment; Most rust code is not massively decorated with unsafe just to get it to compile.

An enormous amount of effort went into designing the Rust language so that unsafe was rarely needed and that the borrow checker had a very low false positive rate. Just because WGSL's uniformity analysis looks superficially similar doesn't mean that it will also automatically be ergonomic to use and have a low false positive rate. It would be great it if were! But that requires work now, not just for designing escape hatches, but also finding ways to make sure those escape hatches are almost never needed.

Oct 17 '22 20:10 fintelia

So, I don't think this case is that the uniform analysis checker is completely at fault. The issue is that doing clever things are not feasible to detect; shaders, especially non-trivial ones, have a tendency to do clever things that work. The truth is that the escape hatches are going to be needed to be used a great in performance sensitive code. Just early outs are a big deal by itself. As a strong example, the derivative operations are not necessarily broken by non-uniform control flow. They are broken when the control flow to the neighbouring pixels is not the same, see the example I wrote above as where derivative will still work although there is an early-out breaking uniform control flow. By not marking a block of code, but just having an unsafe function version, it keeps the escape hatch completely local to the offending function call.

Oct 18 '22 06:10 krogovin

Just because WGSL's uniformity analysis looks superficially similar doesn't mean that it will also automatically be ergonomic to use and have a low false positive rate. It would be great it if were! But that requires work now, not just for designing escape hatches, but also finding ways to make sure those escape hatches are almost never needed.

I agree with all this. I didn't mean to imply that WGSL is fine as it is. My point was similar to yours: people don't simply pursue the lowest possible level of interference, and a well-designed static analysis can be welcomed.

Oct 19 '22 01:10 jimblandy

For developers like us who don't write WGSL directly, marking WGSL blocks as "unsafe" or "uniform" isn't really an option. Unsafe versions of the offending functions, might work in that I could just filter/replace the safe versions with unsafe versions when generating the WGSL if it's determined the shader is intentionally non-uniform or hit by false errors. Otherwise a global option would be needed in our case.

If your translator can translate e.g. x = foo() to x = foo_but_assume_uniform(), it could also instead generate x = unsafe { foo() }, right?

Oct 19 '22 01:10 kdashg

@brendan-duncan wrote:

For developers like us who don't write WGSL directly, marking WGSL blocks as "unsafe" or "uniform" isn't really an option.

Like Kelsey, I'm surprised by this. Couldn't you just always generate WGSL whose functions' bodies are entirely wrapped in an unsafe block? (Assuming WGSL grew such a thing.)

Oct 19 '22 01:10 jimblandy

@jimblandy We use Tint to generate the WGSL from SPIR-V, so perhaps it could be extended to tag generated functions for us if such a tag existed. Otherwise the other option would be to postprocess the WGSL, scan for functions and insert tags after the fact. If tagging functions as unsafe were the only option, I would make it work. But it would be much simpler to have a GPUShaderModuleCompilationHint to skip analysis for the given shader. If a study of shaders causing analysis issues is something you're interested in, we could work with you to provide a large corpus of shaders for study, similar to what we've done for Google.

Oct 19 '22 03:10 brendan-duncan