[FXC] saturate() on vectors is broken in FXC debug mode, and mojoshader doesn't work around it
This might be an upstream bug.
// FIXME: MojoShader workaround
if (0) {
value = saturate(value);
} else {
value = float3(saturate(value.r), saturate(value.g), saturate(value.b));
}
I think MOD_SATURATE is the name you want to look up in the source. It’s some weird miscellaneous value modifier and it’s most likely only covering certain types of values.
The MOD_SATURATE implementation is correct, it seems like clamp() isn't being generated by the shader at all so I'm going to try and figure out why.
Problem with generated code became evident once I hand-inlined constants:
ps_r1 = texture2D(ps_s0, ps_r1.xy);
ps_r2 = ps_r1 + vec4(0);
ps_r1.x = ((ps_r2.x >= 0.0) ? ps_r1.x : 0);
ps_r1.y = ((ps_r2.y >= 0.0) ? ps_r1.y : 0);
ps_r1.z = ((ps_r2.z >= 0.0) ? ps_r1.z : 0);
ps_r1.w = ((ps_r2.w >= 0.0) ? ps_r1.w : 0);
ps_r2 = ps_r1 + vec4(-1.0);
ps_r1.x = ((ps_r2.z >= 0.0) ? 1.0 : ps_r1.z);
ps_r1.y = ((ps_r2.x >= 0.0) ? 1.0 : ps_r1.x);
ps_r1.z = ((ps_r2.y >= 0.0) ? 1.0 : ps_r1.y);
ps_r1.w = ((ps_r2.w >= 0.0) ? 1.0 : ps_r1.w);
This problem could be hidden in the d3d9 bytecode in a way that doesn't show up through the D3D shader compiler, I'll have to see.
That weird zxyw is in the D3D9 bytecode, so maybe this is a fxc bug? I don't know how it works through the D3D api though.
texld r1, r1, s0
add r2, r1, -0.0
cmp r1, r2, r1, 0.0
add r2, r1, -1.0
cmp r1, r2.zxyw, 1.0, r1.zxyw
mov r1.xyz, r1
mov r0.w, c4.x
mov r2.x, c3.x
mov r1.xyz, r1
add r2.y, r2.x, -1.0
mov r1.x, r1.x
mul r2.z, r2.y, r1.x
frc r2.w, r2.z
mov r2.w, -r2.w
add r3.x, r2.z, r2.w
mov r3.y, -r2.z
mov r3.z, -r3.y
add r3.z, r2.z, r3.z
cmp r3.z, r3.z, 0.0, 1.0
add r2.w, r2.w, r2.w
cmp r2.w, r2.w, 0.0, 1.0
mul r2.w, r2.w, r3.z
add r2.w, r2.w, r3.x
Basic test case seems fine with FXC 9.29.952.3111.
Source:
void PixelPoot(inout float4 color : COLOR0)
{
color = saturate(color);
}
technique T
{
pass P
{
PixelShader = compile ps_3_0 PixelPoot();
}
}
ASM:
technique T
{
pass P
{
//No embedded vertex shader
pixelshader =
asm {
//
// Generated by Microsoft (R) HLSL Shader Compiler 9.29.952.3111
ps_3_0
dcl_color v0
mov_sat oC0, v0
// approximately 1 instruction slot used
};
}
}
glsl120:
EFFECT: poot.fxb
PROFILE: glsl120
TECHNIQUE #0 ('T'):
PASS #0 ('P'):
STATE 147:
VALUE: (null) -> (null)
CLASS: OBJECT
TYPE: PIXELSHADER
ROWS/COLUMNS/ELEMENTS: 0, 0, 0
TOTAL VALUES: 1
PIXELSHADER VALUES:
OBJECT #1: SHADER, technique 0, pass 0
PROFILE: glsl
SHADER TYPE: pixel
VERSION: 3.0
INSTRUCTION COUNT: 1
MAIN FUNCTION: ShaderFunction1
INPUTS: (none.)
OUTPUTS:
* (null) ("ps_oC0")
CONSTANTS: (none.)
UNIFORMS: (none.)
SAMPLERS: (none.)
SYMBOLS: (none.)
OUTPUT:
#version 120
#define ps_v0 gl_Color
#define ps_oC0 gl_FragColor
void main()
{
ps_oC0 = clamp(ps_v0, vec4(0.0), vec4(1.0));
}
Yeah it only happens at some point in complexity for the shaders. It does show up in the fxc output at that point, which is nice - mojoshader isn't constructing it from thin air. It might be something specific to that swizzle, because it looks like it was added in a later pixel shader profile than the rest of the swizzles. I haven't figured out what causes it to get generated yet.
Forgot to mention, this is very sensitive to optimization level like most of the other bugs I'm finding. When I messed around with fxc, the codegen for this stuff was dramatically different between /Od and /O0 before you even got to the stuff the later levels of optimization do. This problem only occurs in /Od.
Same test with Od:
ASM:
technique T
{
pass P
{
//No embedded vertex shader
pixelshader =
asm {
//
// Generated by Microsoft (R) HLSL Shader Compiler 9.29.952.3111
ps_3_0
def c0, -0, 0, -1, 1
dcl_color v0
add r0, c0.x, v0
cmp r0, r0, v0, c0.y
add r1, r0, c0.z
cmp oC0, r1, c0.w, r0
// approximately 4 instruction slots used
};
}
}
glsl120:
EFFECT: poot.fxb
PROFILE: glsl120
TECHNIQUE #0 ('T'):
PASS #0 ('P'):
STATE 147:
VALUE: (null) -> (null)
CLASS: OBJECT
TYPE: PIXELSHADER
ROWS/COLUMNS/ELEMENTS: 0, 0, 0
TOTAL VALUES: 1
PIXELSHADER VALUES:
OBJECT #1: SHADER, technique 0, pass 0
PROFILE: glsl
SHADER TYPE: pixel
VERSION: 3.0
INSTRUCTION COUNT: 4
MAIN FUNCTION: ShaderFunction1
INPUTS: (none.)
OUTPUTS:
* (null) ("ps_oC0")
CONSTANTS:
* 0: float (-0.000000 0.000000 -1.000000 1.000000)
UNIFORMS: (none.)
SAMPLERS: (none.)
SYMBOLS: (none.)
OUTPUT:
#version 120
const vec4 ps_c0 = vec4(0.0, 0.0, -1.0, 1.0);
vec4 ps_r0;
vec4 ps_r1;
#define ps_v0 gl_Color
#define ps_oC0 gl_FragColor
void main()
{
ps_r0 = ps_c0.xxxx + ps_v0;
ps_r0.x = ((ps_r0.x >= 0.0) ? ps_v0.x : ps_c0.y);
ps_r0.y = ((ps_r0.y >= 0.0) ? ps_v0.y : ps_c0.y);
ps_r0.z = ((ps_r0.z >= 0.0) ? ps_v0.z : ps_c0.y);
ps_r0.w = ((ps_r0.w >= 0.0) ? ps_v0.w : ps_c0.y);
ps_r1 = ps_r0 + ps_c0.zzzz;
ps_oC0.x = ((ps_r1.x >= 0.0) ? ps_c0.w : ps_r0.x);
ps_oC0.y = ((ps_r1.y >= 0.0) ? ps_c0.w : ps_r0.y);
ps_oC0.z = ((ps_r1.z >= 0.0) ? ps_c0.w : ps_r0.z);
ps_oC0.w = ((ps_r1.w >= 0.0) ? ps_c0.w : ps_r0.w);
}
As... awful as this is, it does appear to be accurate.
Does this help?
\Documents\Projects\Fracture\ext\fxc\fxc.exe /T ps_3_0 \Documents\Projects\FNA\clamptest.fx.txt
/Od /Fo clamptest.fx.bin && testparse.exe glsl120 clamptest.fx.bin > output.glsl.txt
Minor corrections (bad at reducing test cases here):
// bad
value.rgb = saturate(value.rgb);
And then the suspect output from /Od:
ps_r3.x = ((ps_r3.x >= 0.0) ? ps_r1.x : ps_c3.z);
ps_r3.y = ((ps_r3.y >= 0.0) ? ps_r1.y : ps_c3.z);
ps_r3.z = ((ps_r3.z >= 0.0) ? ps_r1.z : ps_c3.z);
ps_r4.xyz = ps_r3.xyz + ps_c3.xxx;
ps_r3.x = ((ps_r4.z >= 0.0) ? ps_c3.w : ps_r3.z);
ps_r3.y = ((ps_r4.x >= 0.0) ? ps_c3.w : ps_r3.x);
ps_r3.z = ((ps_r4.y >= 0.0) ? ps_c3.w : ps_r3.y);
And then for reference, here's that zxyw I noticed before, which was added in a later version of PS (ps_2_0, I think), generated in /Od:
cmp r3.xyz, r4.zxyw, c3.w, r3.zxyw
Here's the /O3 output in comparison:
mov_sat r5.xyz, r5
ps_r5.xyz = clamp(ps_r5.xyz, vec3(0.0), vec3(1.0));
Still looks accurate to me, the only odd thing I'm seeing is this from fxc:
cmp r1.xy, r1.zwzw, v3.zwzw, r1
Weird that it would even bother with zwzw but I guess it's because they didn't want to do r1.xy.