Daemon icon indicating copy to clipboard operation
Daemon copied to clipboard

Model bugs with latest GCC 14 and Clang 19

Open illwieckz opened this issue 1 year ago • 5 comments

I remember @DolceTriade said one day he seen bugs while compiling with GCC 14.

I confirm I see bugs affecting models in a weird way when compiling with GCC 14 (no issue with GCC 13):

unvanquished_2024-09-13_181819_000

unvanquished_2024-09-13_181834_000

First stable version of Clang 19 was released today and then out of curiosity I test it, and I got another bug but affecting the models too, but in a different way (no issue with Clang 18):

unvanquished_2024-09-17_200312_000

unvanquished_2024-09-17_200930_000

Maybe we actually do something wrong if two compilers fail on it.

illwieckz avatar Sep 17 '24 18:09 illwieckz

An easy way to test both GCC 14 and Clang 19 is to use Ubuntu Noble (it has GCC 14 in repositories) and to install Clang 19 from the LLVM upstream repository for Ubuntu: https://apt.llvm.org

illwieckz avatar Sep 17 '24 18:09 illwieckz

As usual this doesn't happen with GCC 14.2 on Windows.

Does it matter what vm.cgame.type you use? Does it matter if you disable r_vboModels?

slipher avatar Sep 17 '24 18:09 slipher

I get the same results with both DLL and EXE cgame.

I get the same results with r_vboVertexSkinning 0 and r_vboModel 0 (actually r_vboVertexSkinning 0 has to be preferred over r_vboModel 0).

illwieckz avatar Sep 17 '24 19:09 illwieckz

Interesting, with NEXE cgame, I reproduce the Clang 19 bug, but not the GCC 14 bug.

illwieckz avatar Sep 17 '24 19:09 illwieckz

Removing -ffast-math fixes issue on GCC14 for me.

DolceTriade avatar Oct 15 '24 14:10 DolceTriade

Engine build with GCC13 (unaffected) with game built with current Saigo (based on Clang 19, affected):

unvanquished_2024-11-19_234514_000

unvanquished_2024-11-19_234532_000

So that looks to be a game code bug.

illwieckz avatar Nov 19 '24 22:11 illwieckz

The bug affects both the engine and the cgame, but it looks different if both engine and games are compiled with affected compilers than only one of them.

engine game result
GCC 13 no fast math Saigo/Clang 19 no fast math ✅️
GCC 13 fast math Saigo/Clang 19 no fast math ✅️
GCC 13 fast math Saigo/Clang 19 fast math ❌️
Clang 19 no fast math Saigo/Clang 19 no fast math ✅️
Clang 19 no fast math Saigo/Clang 19 fast math ❌️
Clang 19 fast math Saigo/Clang 19 no fast math ❌️
Clang 19 fast math Saigo/Clang 19 fast math ❌️

The weirdness seen with non-fast-math Clang 19 engine with fast-math Saigo is the same as fast-math GCC 13 engine with fast-math Saigo.

illwieckz avatar Nov 19 '24 23:11 illwieckz

I have no idea what is causing this, I was hoping to catch some division by zero.

I have noticed that using a debug build workarounds the bug too, even when enabling fast math.

illwieckz avatar Nov 27 '24 22:11 illwieckz

I wonder if some underflow I catch on model loading is related or not:

  • https://github.com/Unvanquished/Unvanquished/pull/3204#issuecomment-2495076144

illwieckz avatar Nov 27 '24 22:11 illwieckz

According to https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html the -ffast-math option does:

Sets the options -fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only, -fno-rounding-math, -fno-signaling-nans, -fcx-limited-range and -fexcess-precision=fast.

So with GCC 14 (known to reproduce the bug) I replaced -ffast-math with all of them and disabled some of them to see when I reproduced or not the bug. I noticed I can enable all of them but -funsafe-math-optimizations and -ffinite-math-only and get no bug.

But, I could also enable either one of -funsafe-math-optimizations or -ffinite-math-only and get no bug. Only the combination of both reproduces the bug.

illwieckz avatar Nov 28 '24 01:11 illwieckz

That GCC documentation page also says that -funsafe-math-optimizations does:

Enables -fno-signed-zeros, -fno-trapping-math, -fassociative-math and -freciprocal-math.

But when I replace -funsafe-math-optimizations with all of them, the bug is not reproduced.

illwieckz avatar Nov 28 '24 01:11 illwieckz

Interestingly, using -funsafe-math-optimizations but with -fsigned-zeros, -ftrapping-math, -fno-associative-math and -fnoreciprocal-math (so the reverse of what it enables), the bug disappears.

illwieckz avatar Nov 28 '24 01:11 illwieckz

It looks like what makes the difference is to use -fsigned-zeros or not, using -fno-signed-zeros reproduces the bug on GCC 14.

illwieckz avatar Nov 28 '24 01:11 illwieckz

But using -ffast-math with -fsigned-zeros or -fno-signed-zeros always reproduce the bug on GCC 14.

It should be noted that using -fsigned-zeros forces the use of -fno-fassociative-math, but using this one without -fsigned-zeros doesn't reproduce the bug.

So the flag combination is not totally reversible, but the bug has strong chance to be related to using -fno-signed-zeros anyway (and maybe -fassociative-math).

illwieckz avatar Nov 28 '24 01:11 illwieckz

Hmm, good to know, using -ftrapping-math with -ffast-math makes the bug go away (-ftrapping-math also forces -fno-associative-math).

So, that may still be some division by zero or things like that, the problem is that we cannot catch it with feenableexcept() when when -ffast-math and then -fno-trapping-math is set because:

Compile code assuming that floating-point operations cannot generate user-visible traps. These traps include division by zero, overflow, underflow, inexact result and invalid operation.

illwieckz avatar Nov 28 '24 01:11 illwieckz

But when I use -ffast-math with -ftrapping-math, the bug disappears and I catch no division by zero. If that's due to some other errors (like underflow), I can't catch them because of the already known ones being caught at load time.

illwieckz avatar Nov 28 '24 02:11 illwieckz

Also using -ffast-math with -fsignaling-nans makes the but go away (but it is says -fno-trapping-math requires -fno-trapping-math to be used instead).

illwieckz avatar Nov 28 '24 02:11 illwieckz

It works without bug on CLANG 19 with -ffast-math -fhonor-nans, so that's likely a NaN occurring somewhere.

It's likely not a division by zero because feenableexcept(FE_DIVBYZERO); doesn't catch it, and -ffast-math -fhonor-infinities -ftrapping-math doesn't as well.

illwieckz avatar Nov 29 '24 07:11 illwieckz

R_TBNtoQtangents is full of NaN.

illwieckz avatar Nov 29 '24 08:11 illwieckz

Disabling custom SSE code workarounds the bug.

illwieckz avatar Nov 29 '24 08:11 illwieckz

Nice find!

We should also see if -ffast-math actually makes a noticeable difference in performance.

DolceTriade avatar Nov 29 '24 23:11 DolceTriade

A quick gut check for plat23 as spec in A base (/setviewpos 2966 1514 292 127 17) with release build and LTO on shows: no -ffast-math: 577 (stable) with -ffast-math: 578-579 (unstable)

Based on this single datapoint, it might not even be worth the hassle.

DolceTriade avatar Nov 29 '24 23:11 DolceTriade

TransAddRotationQuat doesn't seem to work correctly with Clang 19. I have a failing unit test:

static void ExpectTransformEqual(const transform_t &t1, const transform_t &t2)
{
    for (int i = 0; i < 8; i++)
    {
        float n1, n2;
        memcpy(&n1, reinterpret_cast<const char*>(&t1) + 4 * i, 4);
        memcpy(&n2, reinterpret_cast<const char*>(&t2) + 4 * i, 4);
        EXPECT_FLOAT_EQ(n1, n2) << "transform_t's unequal at offset " << 4 * i;
    }
}

TEST(QMathTest, TransAddRotationQuat)
{
    transform_t t1, t2;
    TransInit(&t1);
    TransInit(&t2);

    quat_t q;
    QuatFromAngles(q, 33, 59, 124);

    TransAddRotationQuat(q, &t1);
    TransInsRotationQuat(q, &t2);

    ExpectTransformEqual(t1, t2);
}

This unit test might not even be correct if there is more than 1 way to represent a quat or something, but it passes with another compiler.

slipher avatar Nov 30 '24 02:11 slipher

I added __attribute__((noinline)) to TransAddRotationQuat so I could disassemble it. The last 3 instructions set the entire sseTransScale part of the transform to 0. All of our coordinates being multiplied by 0 could certainly explain models not showing up.

Dump of assembler code for function TransAddRotationQuat(float const*, transform_t*):
   0x0000000000039e40 <+0>:	movups (%rdi),%xmm1
   0x0000000000039e43 <+3>:	movaps %xmm1,%xmm0
   0x0000000000039e46 <+6>:	shufps $0xff,%xmm1,%xmm0
   0x0000000000039e4a <+10>:	movaps (%rsi),%xmm2
   0x0000000000039e4d <+13>:	mulps  %xmm2,%xmm0
   0x0000000000039e50 <+16>:	movaps %xmm1,%xmm3
   0x0000000000039e53 <+19>:	shufps $0x24,%xmm1,%xmm3
   0x0000000000039e57 <+23>:	movaps %xmm2,%xmm4
   0x0000000000039e5a <+26>:	shufps $0x3f,%xmm2,%xmm4
   0x0000000000039e5e <+30>:	mulps  %xmm3,%xmm4
   0x0000000000039e61 <+33>:	movaps 0xad6e8(%rip),%xmm3        # 0xe7550
   0x0000000000039e68 <+40>:	xorps  %xmm3,%xmm4
   0x0000000000039e6b <+43>:	movaps %xmm1,%xmm5
   0x0000000000039e6e <+46>:	shufps $0x49,%xmm1,%xmm5
   0x0000000000039e72 <+50>:	movaps %xmm2,%xmm6
   0x0000000000039e75 <+53>:	shufps $0x52,%xmm2,%xmm6
   0x0000000000039e79 <+57>:	mulps  %xmm5,%xmm6
   0x0000000000039e7c <+60>:	xorps  %xmm3,%xmm6
   0x0000000000039e7f <+63>:	addps  %xmm4,%xmm6
   0x0000000000039e82 <+66>:	shufps $0x89,%xmm2,%xmm2
   0x0000000000039e86 <+70>:	shufps $0x92,%xmm1,%xmm1
   0x0000000000039e8a <+74>:	mulps  %xmm2,%xmm1
   0x0000000000039e8d <+77>:	subps  %xmm1,%xmm0
   0x0000000000039e90 <+80>:	addps  %xmm6,%xmm0
   0x0000000000039e93 <+83>:	movaps %xmm0,(%rsi)
   0x0000000000039e96 <+86>:	xorps  %xmm0,%xmm0
   0x0000000000039e99 <+89>:	movaps %xmm0,0x10(%rsi)
   0x0000000000039e9d <+93>:	ret    

slipher avatar Nov 30 '24 03:11 slipher

The problem with Clang is that it now can't correctly handle SSE bit mask intrinsics on floating point vectors when -ffinite-math is on: https://github.com/llvm/llvm-project/issues/118152. The arguments to _mm_and_ps, _mm_or_ps, etc., may be bit masks, which Clang wrongly interprets as floats. When some component of the mask is all ones, it interprets it as NaN and invokes undefined behavior.

We can work around this by using shuffle intrinsics instead of bit masks. These are presumably faster anyway: if you put the code with masks into (a non-bugged configuration of) Clang, it transforms them into shuffles.

slipher avatar Nov 30 '24 03:11 slipher

Disabling SSE doesn't fix the GCC 14 bug, so that's a different bug than the Clang one.

illwieckz avatar Dec 02 '24 13:12 illwieckz

I wonder what is that remaining GCC bug. I noticed the distortion is affected by the orientation of the model on vertical axis. With some angles there is no distortion, with some there is a few, with some others there is much.

illwieckz avatar Dec 19 '24 22:12 illwieckz

This can be seen while playing with a dretch and cg_thirdPerson enabled and moving around.

illwieckz avatar Dec 19 '24 22:12 illwieckz

I now have a branch with divby0, overflow and invalid float exceptions and all of them fixed, but I still get the model distortion while not getting any of those float exceptions…

I now suspect a compiler bug in some optimization.

illwieckz avatar Dec 20 '24 01:12 illwieckz

So, as a summary, the -ftrapping-math flag is making the bug go away.

flags result
-ffast-math -fno-associative-math ❌️ bug
-ffast-math -fno-associative-math -ftrapping-math ✅️ no bug

illwieckz avatar Dec 20 '24 03:12 illwieckz