Daemon
Daemon copied to clipboard
Model bugs with latest GCC 14 and Clang 19
I remember @DolceTriade said one day he seen bugs while compiling with GCC 14.
I confirm I see bugs affecting models in a weird way when compiling with GCC 14 (no issue with GCC 13):
First stable version of Clang 19 was released today and then out of curiosity I test it, and I got another bug but affecting the models too, but in a different way (no issue with Clang 18):
Maybe we actually do something wrong if two compilers fail on it.
An easy way to test both GCC 14 and Clang 19 is to use Ubuntu Noble (it has GCC 14 in repositories) and to install Clang 19 from the LLVM upstream repository for Ubuntu: https://apt.llvm.org
As usual this doesn't happen with GCC 14.2 on Windows.
Does it matter what vm.cgame.type you use? Does it matter if you disable r_vboModels?
I get the same results with both DLL and EXE cgame.
I get the same results with r_vboVertexSkinning 0 and r_vboModel 0 (actually r_vboVertexSkinning 0 has to be preferred over r_vboModel 0).
Interesting, with NEXE cgame, I reproduce the Clang 19 bug, but not the GCC 14 bug.
Removing -ffast-math fixes issue on GCC14 for me.
Engine build with GCC13 (unaffected) with game built with current Saigo (based on Clang 19, affected):
So that looks to be a game code bug.
The bug affects both the engine and the cgame, but it looks different if both engine and games are compiled with affected compilers than only one of them.
| engine | game | result |
|---|---|---|
| GCC 13 no fast math | Saigo/Clang 19 no fast math | ✅️ |
| GCC 13 fast math | Saigo/Clang 19 no fast math | ✅️ |
| GCC 13 fast math | Saigo/Clang 19 fast math | ❌️ |
| Clang 19 no fast math | Saigo/Clang 19 no fast math | ✅️ |
| Clang 19 no fast math | Saigo/Clang 19 fast math | ❌️ |
| Clang 19 fast math | Saigo/Clang 19 no fast math | ❌️ |
| Clang 19 fast math | Saigo/Clang 19 fast math | ❌️ |
The weirdness seen with non-fast-math Clang 19 engine with fast-math Saigo is the same as fast-math GCC 13 engine with fast-math Saigo.
I have no idea what is causing this, I was hoping to catch some division by zero.
I have noticed that using a debug build workarounds the bug too, even when enabling fast math.
I wonder if some underflow I catch on model loading is related or not:
- https://github.com/Unvanquished/Unvanquished/pull/3204#issuecomment-2495076144
According to https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html the -ffast-math option does:
Sets the options
-fno-math-errno,-funsafe-math-optimizations,-ffinite-math-only,-fno-rounding-math,-fno-signaling-nans,-fcx-limited-rangeand-fexcess-precision=fast.
So with GCC 14 (known to reproduce the bug) I replaced -ffast-math with all of them and disabled some of them to see when I reproduced or not the bug. I noticed I can enable all of them but -funsafe-math-optimizations and -ffinite-math-only and get no bug.
But, I could also enable either one of -funsafe-math-optimizations or -ffinite-math-only and get no bug. Only the combination of both reproduces the bug.
That GCC documentation page also says that -funsafe-math-optimizations does:
Enables
-fno-signed-zeros,-fno-trapping-math,-fassociative-mathand-freciprocal-math.
But when I replace -funsafe-math-optimizations with all of them, the bug is not reproduced.
Interestingly, using -funsafe-math-optimizations but with -fsigned-zeros, -ftrapping-math, -fno-associative-math and -fnoreciprocal-math (so the reverse of what it enables), the bug disappears.
It looks like what makes the difference is to use -fsigned-zeros or not, using -fno-signed-zeros reproduces the bug on GCC 14.
But using -ffast-math with -fsigned-zeros or -fno-signed-zeros always reproduce the bug on GCC 14.
It should be noted that using -fsigned-zeros forces the use of -fno-fassociative-math, but using this one without -fsigned-zeros doesn't reproduce the bug.
So the flag combination is not totally reversible, but the bug has strong chance to be related to using -fno-signed-zeros anyway (and maybe -fassociative-math).
Hmm, good to know, using -ftrapping-math with -ffast-math makes the bug go away (-ftrapping-math also forces -fno-associative-math).
So, that may still be some division by zero or things like that, the problem is that we cannot catch it with feenableexcept() when when -ffast-math and then -fno-trapping-math is set because:
Compile code assuming that floating-point operations cannot generate user-visible traps. These traps include division by zero, overflow, underflow, inexact result and invalid operation.
But when I use -ffast-math with -ftrapping-math, the bug disappears and I catch no division by zero. If that's due to some other errors (like underflow), I can't catch them because of the already known ones being caught at load time.
Also using -ffast-math with -fsignaling-nans makes the but go away (but it is says -fno-trapping-math requires -fno-trapping-math to be used instead).
It works without bug on CLANG 19 with -ffast-math -fhonor-nans, so that's likely a NaN occurring somewhere.
It's likely not a division by zero because feenableexcept(FE_DIVBYZERO); doesn't catch it, and -ffast-math -fhonor-infinities -ftrapping-math doesn't as well.
R_TBNtoQtangents is full of NaN.
Disabling custom SSE code workarounds the bug.
Nice find!
We should also see if -ffast-math actually makes a noticeable difference in performance.
A quick gut check for plat23 as spec in A base (/setviewpos 2966 1514 292 127 17) with release build and LTO on shows:
no -ffast-math: 577 (stable)
with -ffast-math: 578-579 (unstable)
Based on this single datapoint, it might not even be worth the hassle.
TransAddRotationQuat doesn't seem to work correctly with Clang 19. I have a failing unit test:
static void ExpectTransformEqual(const transform_t &t1, const transform_t &t2)
{
for (int i = 0; i < 8; i++)
{
float n1, n2;
memcpy(&n1, reinterpret_cast<const char*>(&t1) + 4 * i, 4);
memcpy(&n2, reinterpret_cast<const char*>(&t2) + 4 * i, 4);
EXPECT_FLOAT_EQ(n1, n2) << "transform_t's unequal at offset " << 4 * i;
}
}
TEST(QMathTest, TransAddRotationQuat)
{
transform_t t1, t2;
TransInit(&t1);
TransInit(&t2);
quat_t q;
QuatFromAngles(q, 33, 59, 124);
TransAddRotationQuat(q, &t1);
TransInsRotationQuat(q, &t2);
ExpectTransformEqual(t1, t2);
}
This unit test might not even be correct if there is more than 1 way to represent a quat or something, but it passes with another compiler.
I added __attribute__((noinline)) to TransAddRotationQuat so I could disassemble it. The last 3 instructions set the entire sseTransScale part of the transform to 0. All of our coordinates being multiplied by 0 could certainly explain models not showing up.
Dump of assembler code for function TransAddRotationQuat(float const*, transform_t*):
0x0000000000039e40 <+0>: movups (%rdi),%xmm1
0x0000000000039e43 <+3>: movaps %xmm1,%xmm0
0x0000000000039e46 <+6>: shufps $0xff,%xmm1,%xmm0
0x0000000000039e4a <+10>: movaps (%rsi),%xmm2
0x0000000000039e4d <+13>: mulps %xmm2,%xmm0
0x0000000000039e50 <+16>: movaps %xmm1,%xmm3
0x0000000000039e53 <+19>: shufps $0x24,%xmm1,%xmm3
0x0000000000039e57 <+23>: movaps %xmm2,%xmm4
0x0000000000039e5a <+26>: shufps $0x3f,%xmm2,%xmm4
0x0000000000039e5e <+30>: mulps %xmm3,%xmm4
0x0000000000039e61 <+33>: movaps 0xad6e8(%rip),%xmm3 # 0xe7550
0x0000000000039e68 <+40>: xorps %xmm3,%xmm4
0x0000000000039e6b <+43>: movaps %xmm1,%xmm5
0x0000000000039e6e <+46>: shufps $0x49,%xmm1,%xmm5
0x0000000000039e72 <+50>: movaps %xmm2,%xmm6
0x0000000000039e75 <+53>: shufps $0x52,%xmm2,%xmm6
0x0000000000039e79 <+57>: mulps %xmm5,%xmm6
0x0000000000039e7c <+60>: xorps %xmm3,%xmm6
0x0000000000039e7f <+63>: addps %xmm4,%xmm6
0x0000000000039e82 <+66>: shufps $0x89,%xmm2,%xmm2
0x0000000000039e86 <+70>: shufps $0x92,%xmm1,%xmm1
0x0000000000039e8a <+74>: mulps %xmm2,%xmm1
0x0000000000039e8d <+77>: subps %xmm1,%xmm0
0x0000000000039e90 <+80>: addps %xmm6,%xmm0
0x0000000000039e93 <+83>: movaps %xmm0,(%rsi)
0x0000000000039e96 <+86>: xorps %xmm0,%xmm0
0x0000000000039e99 <+89>: movaps %xmm0,0x10(%rsi)
0x0000000000039e9d <+93>: ret
The problem with Clang is that it now can't correctly handle SSE bit mask intrinsics on floating point vectors when -ffinite-math is on: https://github.com/llvm/llvm-project/issues/118152. The arguments to _mm_and_ps, _mm_or_ps, etc., may be bit masks, which Clang wrongly interprets as floats. When some component of the mask is all ones, it interprets it as NaN and invokes undefined behavior.
We can work around this by using shuffle intrinsics instead of bit masks. These are presumably faster anyway: if you put the code with masks into (a non-bugged configuration of) Clang, it transforms them into shuffles.
Disabling SSE doesn't fix the GCC 14 bug, so that's a different bug than the Clang one.
I wonder what is that remaining GCC bug. I noticed the distortion is affected by the orientation of the model on vertical axis. With some angles there is no distortion, with some there is a few, with some others there is much.
This can be seen while playing with a dretch and cg_thirdPerson enabled and moving around.
I now have a branch with divby0, overflow and invalid float exceptions and all of them fixed, but I still get the model distortion while not getting any of those float exceptions…
I now suspect a compiler bug in some optimization.
So, as a summary, the -ftrapping-math flag is making the bug go away.
| flags | result |
|---|---|
-ffast-math -fno-associative-math |
❌️ bug |
-ffast-math -fno-associative-math -ftrapping-math |
✅️ no bug |