DirectXShaderCompiler icon indicating copy to clipboard operation
DirectXShaderCompiler copied to clipboard

ExecutionTest::UnaryHalfOpTest#AcosHalf is failing on recent Windows builds

Open hekota opened this issue 1 year ago • 5 comments

Description ExecutionTest::UnaryHalfOpTest#AcosHalf is failing on recent Windows builds. Reproes on x64 and arm64.

Steps to Reproduce On pre-release Windows build 23H2 version 26016.1001: hcttest exec-filter ExecutionTest::UnaryHalfOpTest#AcosHalf

Worked fine on Windows build 22621.2715.

Actual Behavior

TAEF: Data[ShaderOp.Text]:  struct SUnaryFPOp {
                float16_t input;
                float16_t output;
            };
            RWStructuredBuffer<SUnaryFPOp> g_buf : register(u0);
            [numthreads(8,8,1)]
            void main(uint GI : SV_GroupIndex) {
                SUnaryFPOp l = g_buf[GI];
                l.output = acos(l.input);
                g_buf[GI] = l;
            };

TAEF: Data[Validation.Expected1]: NaN NaN 1.570796 1.570796 1.570796 1.570796 NaN 0 3.1415926 NaN NaN

TAEF: Data[Validation.Input1]: NaN -Inf -denorm -0 0 denorm Inf 1 -1 1.5 -1.5

TAEF: Data[Validation.Tolerance]: 0.0008

TAEF: Data[Validation.Type]: Epsilon

Using Adapter:Microsoft Basic Render Driver

element #0, input =    nan(0x7fff), output =    nan(0x7fff), expected =    nan(0x7e00)

element #1, input =   -inf(0xfc00), output =    nan(0x7fff), expected =    nan(0x7e00)

element #2, input = -0.00000048(0x8008), output = 1.57128906(0x3e49), expected = 1.57031250(0x3e48)

Error: Verify: IsTrue(CompareOutputWithExpectedValueHalf(output, ref, type, tolerance)) [File: C:\__w\1\s\DXC\tools\clang\unittests\HLSLExec\ExecutionTest.cpp, Function: VerifyOutputWithExpectedValueHalf, Line: 6989]

Environment Pre-release Windows build 23H2 version 26016.1001 DXC main as of 1/23/2024

hekota avatar Jan 23 '24 01:01 hekota

Surprised to see such an old test failing

pow2clk avatar Jan 23 '24 16:01 pow2clk

Updating title and description - this is not ARM64 specific.

hekota avatar Jan 25 '24 00:01 hekota

@jenatali - This seems to be an issue with Microsoft Basic Render Driver, internal bug 43474291.

hekota avatar Feb 06 '24 18:02 hekota

Reopening. According to @jenatali investigation this is a test issue. More info in internal bug #43474291.

hekota avatar Mar 07 '24 20:03 hekota

Replicating the "more info" from the internal bug:

Background: The change that I made that regressed the test was to fix WARP's fp16 rounding. The D3D11 spec has 2 relevant sections:

  • Float->float conversions: when converting to a smaller float, the value should be truncated / round-towards-zero (RTZ).
  • Float math: any higher-precision intermediate values should be rounded into the final floating point result using round-to-nearest-even (RTNE).

WARP ends up doing all fp16 math by converting to fp32, doing fp32 math, and then converting back to fp16, so it was using RTZ because that's what conversions required. I fixed it to follow spec and use RTNE for that conversion, if the conversion is implicit and not due to an explicit conversion instruction.

The failure: When given an input value of "-denorm" (-0.00000048(0x8008)), I get output = 1.57128906(0x3e49), expected = 1.57031250(0x3e48). So the value is off-by-one.

Why does it happen? ​Since WARP is doing this as an fp32 operation, the fp32 intermediate value it produces is 1.570864. This is actually closer to 0x3e49 than 0x3e48, and so rounding up is the correct operation for this fp32 intermediate value.

However there's a few bugs in the test that combine to produce this failure.

  • The test's actual expected output, according to the XML, is 1.570796.
    • That's not a representable fp16 value, meaning that the value in the XML has to be rounded in one direction or another to produce the value that the test is actually using for a comparison.
    • That's not the right value. While WARP's intermediate (1.570864) is within the 0.0008 tolerance, it's not really that close to the right result. The reason is because this is the correct value for an fp32 "-denorm" input. In fp32, denorms get flushed and/or are small enough to not really affect the output, meaning that 1.570796 is actually the correct result for 0, not fp16 -denorm. In fp16, denorms must not get flushed.
  • The test's tolerance is 0.0008. For floating point, I don't think it really makes sense to specify a fixed point tolerance like that. With values as large as these, that tolerance is way too small, but for smaller values, that tolerance could allow completely different results.

So, WARP's producing a value that should be considered correct. However, since other graphics drivers are passing the test as-is, we can't just change the expected result to match WARP's. The test needs to have its tolerance changed so that either rounding direction is acceptable.

My suspicion for how IHVs are passing is that they're doing all of the intermediate calculations for acos in fp16, rounding multiple times along the way, where WARP keeps the data in fp32 and only rounds the final result. Since DXIL doesn't really spec a precision for acos, either of these implementations should be valid.

For what it's worth, the change that I prepared which gave me confidence that this is a test bug was to try to get WARP to use hardware instructions for fp16 conversions (x86 F16C extensions), instead of the software conversion routine that I wrote. Even with that, the test still failed in exactly the same way, which tells me that the conversion is right, and therefore the test must be wrong.

jenatali avatar May 09 '24 17:05 jenatali