mojoshader icon indicating copy to clipboard operation
mojoshader copied to clipboard

[FXC] saturate() on vectors is broken in FXC debug mode, and mojoshader doesn't work around it

Open kg opened this issue 6 years ago • 12 comments

This might be an upstream bug.

    // FIXME: MojoShader workaround
    if (0) {
        value = saturate(value);
    } else {
        value = float3(saturate(value.r), saturate(value.g), saturate(value.b));
    }

kg avatar May 28 '19 01:05 kg

I think MOD_SATURATE is the name you want to look up in the source. It’s some weird miscellaneous value modifier and it’s most likely only covering certain types of values.

flibitijibibo avatar May 28 '19 01:05 flibitijibibo

The MOD_SATURATE implementation is correct, it seems like clamp() isn't being generated by the shader at all so I'm going to try and figure out why.

kg avatar May 29 '19 03:05 kg

Problem with generated code became evident once I hand-inlined constants:

    ps_r1 = texture2D(ps_s0, ps_r1.xy);
    ps_r2 = ps_r1 + vec4(0);
    ps_r1.x = ((ps_r2.x >= 0.0) ? ps_r1.x : 0);
    ps_r1.y = ((ps_r2.y >= 0.0) ? ps_r1.y : 0);
    ps_r1.z = ((ps_r2.z >= 0.0) ? ps_r1.z : 0);
    ps_r1.w = ((ps_r2.w >= 0.0) ? ps_r1.w : 0);
    ps_r2 = ps_r1 + vec4(-1.0);
    ps_r1.x = ((ps_r2.z >= 0.0) ? 1.0 : ps_r1.z);
    ps_r1.y = ((ps_r2.x >= 0.0) ? 1.0 : ps_r1.x);
    ps_r1.z = ((ps_r2.y >= 0.0) ? 1.0 : ps_r1.y);
    ps_r1.w = ((ps_r2.w >= 0.0) ? 1.0 : ps_r1.w);

This problem could be hidden in the d3d9 bytecode in a way that doesn't show up through the D3D shader compiler, I'll have to see.

kg avatar May 29 '19 03:05 kg

That weird zxyw is in the D3D9 bytecode, so maybe this is a fxc bug? I don't know how it works through the D3D api though.

    texld r1, r1, s0
    add r2, r1, -0.0
    cmp r1, r2, r1, 0.0
    add r2, r1, -1.0
    cmp r1, r2.zxyw, 1.0, r1.zxyw
    mov r1.xyz, r1
    mov r0.w, c4.x
    mov r2.x, c3.x
    mov r1.xyz, r1
    add r2.y, r2.x, -1.0
    mov r1.x, r1.x
    mul r2.z, r2.y, r1.x
    frc r2.w, r2.z
    mov r2.w, -r2.w
    add r3.x, r2.z, r2.w
    mov r3.y, -r2.z
    mov r3.z, -r3.y
    add r3.z, r2.z, r3.z
    cmp r3.z, r3.z, 0.0, 1.0
    add r2.w, r2.w, r2.w
    cmp r2.w, r2.w, 0.0, 1.0
    mul r2.w, r2.w, r3.z
    add r2.w, r2.w, r3.x

kg avatar May 29 '19 03:05 kg

Basic test case seems fine with FXC 9.29.952.3111.

Source:

void PixelPoot(inout float4 color : COLOR0)
{
        color = saturate(color);
}

technique T
{
        pass P
        {
                PixelShader = compile ps_3_0 PixelPoot();
        }
}

ASM:

technique T
{
    pass P
    {
        //No embedded vertex shader
        pixelshader = 
            asm {
            //
            // Generated by Microsoft (R) HLSL Shader Compiler 9.29.952.3111
                ps_3_0
                dcl_color v0
                mov_sat oC0, v0
            
            // approximately 1 instruction slot used
            };
    }
}

glsl120:

EFFECT: poot.fxb
    PROFILE: glsl120


    TECHNIQUE #0 ('T'):
        PASS #0 ('P'):
            STATE 147:
                VALUE: (null) -> (null)
                    CLASS: OBJECT
                    TYPE: PIXELSHADER
                    ROWS/COLUMNS/ELEMENTS: 0, 0, 0
                    TOTAL VALUES: 1
                    PIXELSHADER VALUES:

    OBJECT #1: SHADER, technique 0, pass 0
        PROFILE: glsl
        SHADER TYPE: pixel
        VERSION: 3.0
        INSTRUCTION COUNT: 1
        MAIN FUNCTION: ShaderFunction1
        INPUTS: (none.)
        OUTPUTS:
            * (null) ("ps_oC0")
        CONSTANTS: (none.)
        UNIFORMS: (none.)
        SAMPLERS: (none.)
        SYMBOLS: (none.)
        OUTPUT:
            #version 120
            #define ps_v0 gl_Color
            #define ps_oC0 gl_FragColor
            
            void main()
            {
            	ps_oC0 = clamp(ps_v0, vec4(0.0), vec4(1.0));
            }

flibitijibibo avatar May 29 '19 17:05 flibitijibibo

Yeah it only happens at some point in complexity for the shaders. It does show up in the fxc output at that point, which is nice - mojoshader isn't constructing it from thin air. It might be something specific to that swizzle, because it looks like it was added in a later pixel shader profile than the rest of the swizzles. I haven't figured out what causes it to get generated yet.

kg avatar May 29 '19 17:05 kg

Forgot to mention, this is very sensitive to optimization level like most of the other bugs I'm finding. When I messed around with fxc, the codegen for this stuff was dramatically different between /Od and /O0 before you even got to the stuff the later levels of optimization do. This problem only occurs in /Od.

kg avatar May 29 '19 17:05 kg

Same test with Od:

ASM:

technique T
{
    pass P
    {
        //No embedded vertex shader
        pixelshader = 
            asm {
            //
            // Generated by Microsoft (R) HLSL Shader Compiler 9.29.952.3111
                ps_3_0
                def c0, -0, 0, -1, 1
                dcl_color v0
                add r0, c0.x, v0
                cmp r0, r0, v0, c0.y
                add r1, r0, c0.z
                cmp oC0, r1, c0.w, r0
            
            // approximately 4 instruction slots used
            };
    }
}

glsl120:

EFFECT: poot.fxb
    PROFILE: glsl120


    TECHNIQUE #0 ('T'):
        PASS #0 ('P'):
            STATE 147:
                VALUE: (null) -> (null)
                    CLASS: OBJECT
                    TYPE: PIXELSHADER
                    ROWS/COLUMNS/ELEMENTS: 0, 0, 0
                    TOTAL VALUES: 1
                    PIXELSHADER VALUES:

    OBJECT #1: SHADER, technique 0, pass 0
        PROFILE: glsl
        SHADER TYPE: pixel
        VERSION: 3.0
        INSTRUCTION COUNT: 4
        MAIN FUNCTION: ShaderFunction1
        INPUTS: (none.)
        OUTPUTS:
            * (null) ("ps_oC0")
        CONSTANTS:
            * 0: float (-0.000000 0.000000 -1.000000 1.000000)
        UNIFORMS: (none.)
        SAMPLERS: (none.)
        SYMBOLS: (none.)
        OUTPUT:
            #version 120
            const vec4 ps_c0 = vec4(0.0, 0.0, -1.0, 1.0);
            vec4 ps_r0;
            vec4 ps_r1;
            #define ps_v0 gl_Color
            #define ps_oC0 gl_FragColor
            
            void main()
            {
            	ps_r0 = ps_c0.xxxx + ps_v0;
            	ps_r0.x = ((ps_r0.x >= 0.0) ? ps_v0.x : ps_c0.y);
            	ps_r0.y = ((ps_r0.y >= 0.0) ? ps_v0.y : ps_c0.y);
            	ps_r0.z = ((ps_r0.z >= 0.0) ? ps_v0.z : ps_c0.y);
            	ps_r0.w = ((ps_r0.w >= 0.0) ? ps_v0.w : ps_c0.y);
            	ps_r1 = ps_r0 + ps_c0.zzzz;
            	ps_oC0.x = ((ps_r1.x >= 0.0) ? ps_c0.w : ps_r0.x);
            	ps_oC0.y = ((ps_r1.y >= 0.0) ? ps_c0.w : ps_r0.y);
            	ps_oC0.z = ((ps_r1.z >= 0.0) ? ps_c0.w : ps_r0.z);
            	ps_oC0.w = ((ps_r1.w >= 0.0) ? ps_c0.w : ps_r0.w);
            }

As... awful as this is, it does appear to be accurate.

flibitijibibo avatar May 29 '19 17:05 flibitijibibo

Does this help?

\Documents\Projects\Fracture\ext\fxc\fxc.exe /T ps_3_0 \Documents\Projects\FNA\clamptest.fx.txt
 /Od /Fo clamptest.fx.bin && testparse.exe glsl120 clamptest.fx.bin > output.glsl.txt

clamptest.fx.txt output.glsl.txt

kg avatar May 29 '19 21:05 kg

Minor corrections (bad at reducing test cases here):

    // bad
    value.rgb = saturate(value.rgb);

And then the suspect output from /Od:

        	ps_r3.x = ((ps_r3.x >= 0.0) ? ps_r1.x : ps_c3.z);
        	ps_r3.y = ((ps_r3.y >= 0.0) ? ps_r1.y : ps_c3.z);
        	ps_r3.z = ((ps_r3.z >= 0.0) ? ps_r1.z : ps_c3.z);
        	ps_r4.xyz = ps_r3.xyz + ps_c3.xxx;
        	ps_r3.x = ((ps_r4.z >= 0.0) ? ps_c3.w : ps_r3.z);
        	ps_r3.y = ((ps_r4.x >= 0.0) ? ps_c3.w : ps_r3.x);
        	ps_r3.z = ((ps_r4.y >= 0.0) ? ps_c3.w : ps_r3.y);

kg avatar May 29 '19 21:05 kg

And then for reference, here's that zxyw I noticed before, which was added in a later version of PS (ps_2_0, I think), generated in /Od:

    cmp r3.xyz, r4.zxyw, c3.w, r3.zxyw

Here's the /O3 output in comparison:

    mov_sat r5.xyz, r5
        	ps_r5.xyz = clamp(ps_r5.xyz, vec3(0.0), vec3(1.0));

kg avatar May 29 '19 21:05 kg

Still looks accurate to me, the only odd thing I'm seeing is this from fxc:

cmp r1.xy, r1.zwzw, v3.zwzw, r1

Weird that it would even bother with zwzw but I guess it's because they didn't want to do r1.xy.

flibitijibibo avatar May 29 '19 21:05 flibitijibibo