ldc
ldc copied to clipboard
Test failures on linux aarch64
Some tests fail on my Raspberry Pi 4 running Gentoo.
The first issue is with std.internal.math.gammafunction:
ctest -R 'std.internal.math.gammafunction'
Test project /home/ldc/build
Start 377: std.internal.math.gammafunction
1/4 Test #377: std.internal.math.gammafunction ................***Failed 0.02 sec
Start 812: std.internal.math.gammafunction-debug
2/4 Test #812: std.internal.math.gammafunction-debug ..........***Failed 5.70 sec
Start 1247: std.internal.math.gammafunction-shared
3/4 Test #1247: std.internal.math.gammafunction-shared .........***Failed 0.04 sec
Start 1682: std.internal.math.gammafunction-debug-shared
4/4 Test #1682: std.internal.math.gammafunction-debug-shared ...***Failed 0.07 sec
This is because https://github.com/dlang/phobos/blob/14b23633b762cfd7b03614dca4c6b0cafa1016e5/std/internal/math/gammafunction.d#L396 contains real.mant_dig which on my PC is 64 but on the PI it is 113. I guess the solution is to change the value to a constant and not have it be dependent on real since this type varies by platform, but I'm not 100% sure what to do.
Secondly std.internal.exponential:
ctest -R 'std.math.exponential'
Test project /home/ldc/build
Start 397: std.math.exponential
1/4 Test #397: std.math.exponential ................***Failed 0.02 sec
Start 832: std.math.exponential-debug
2/4 Test #832: std.math.exponential-debug .......... Passed 0.03 sec
Start 1267: std.math.exponential-shared
3/4 Test #1267: std.math.exponential-shared .........***Failed 0.04 sec
Start 1702: std.math.exponential-debug-shared
4/4 Test #1702: std.math.exponential-debug-shared ... Passed 0.07 sec
Which is weird since only the release builds fail. I've tried to minimize the test case to:
real pow(real x, real y) @trusted @nogc pure nothrow
{
long iy = cast(long) y;
//assert(iy != y);
if (iy == y) {
assert(false);
}
assert(false);
}
void main () {
import std.math.traits : isNaN;
assert(isNaN(pow(-1.0L, 1/real.epsilon - 0.5L)));
}
The problem with this one is that the assert fails on different lines based on optimizations:
../bin/ldc2 -O -run repro.d
[email protected](6): Assertion failure
----------------
??:? [0x556070c0bf]
??:? [0x556070bd0b]
??:? [0x556072f28f]
??:? [0x5560711b6b]
??:? [0x556070a9d7]
??:? [0x5560709faf]
??:? [0x5560711837]
??:? [0x556071171b]
??:? [0x5560711587]
??:? [0x7fbd44738b]
??:? __libc_start_main [0x7fbd44745f]
??:? [0x5560709eaf]
Error: /tmp/repro-d8ac08 failed with status: 1
and
./bin/ldc2 -run repro.d
[email protected](8): Assertion failure
----------------
??:? [0x557b4cc1db]
??:? [0x557b4cbe27]
??:? [0x557b4ef3ab]
??:? [0x557b4d1c87]
??:? [0x557b4caaf3]
??:? [0x557b4ca027]
??:? [0x557b4ca043]
??:? [0x557b4d1953]
??:? [0x557b4d1837]
??:? [0x557b4d16a3]
??:? [0x557b4ca0cb]
??:? [0x7f9668738b]
??:? __libc_start_main [0x7f9668745f]
??:? [0x557b4c9eaf]
Error: /tmp/repro-1c99cf failed with status: 1
It's possible that bad code is generated. Another thing, I don't know how helpful, is that uncommenting the //assert line makes the optimized build not go into the if (and fail normally).
The last problem I had was with core.thread.fiber:
ctest --timeout 10 -R 'core.thread.fiber'
Test project /home/ldc/build
Start 130: core.thread.fiber
1/4 Test #130: core.thread.fiber ................***Timeout 10.03 sec
Start 565: core.thread.fiber-debug
2/4 Test #565: core.thread.fiber-debug .......... Passed 0.26 sec
Start 1000: core.thread.fiber-shared
3/4 Test #1000: core.thread.fiber-shared .........***Timeout 10.02 sec
Start 1435: core.thread.fiber-debug-shared
4/4 Test #1435: core.thread.fiber-debug-shared ... Passed 0.29 sec
The release builds hang. Running the tests directly I get:
./runtime/druntime-test-runner core.thread.fiber
Not safe to migrate Fibers between Threads on your system. Consider setting version CheckFiberMigration for this system in thread.d
****** FAIL release64 core.thread.fiber
core.exception.AssertError@core/thread/fiber.d(1078): HOLD != EXEC
----------------
??:? [0x559245bb43]
??:? [0x559245b527]
??:? [0x55924ca597]
??:? [0x55924cbc3b]
??:? [0x55923af9e3]
??:? [0x559247070b]
??:? [0x559246c77f]
??:? [0x55924ef213]
^C
./runtime/druntime-test-runner-shared core.thread.fiber
Not safe to migrate Fibers between Threads on your system. Consider setting version CheckFiberMigration for this system in thread.d
****** FAIL release64 core.thread.fiber
core.exception.AssertError@core/thread/fiber.d(1077): Fiber.yield() called with no active fiber
----------------
??:? _d_assert_msg [0x7fbf10bbc3]
??:? void core.thread.fiber.TestFiber.run() [0x7fbf1cc31f]
??:? fiber_entryPoint [0x7fbf1c83bb]
??:? [0x7fbf24ca23]
^C
Enabling ChekFiberMigration doesn't solve it:
./runtime/druntime-test-runner-shared core.thread.fiber
****** FAIL release64 core.thread.fiber
core.exception.AssertError@core/thread/fiber.d(1078): Fiber.yield() called with no active fiber
----------------
??:? _d_assert_msg [0x7f8555bbe3]
??:? void core.thread.fiber.TestFiber.run() [0x7f8561c3cb]
??:? fiber_entryPoint [0x7f856183db]
??:? [0x7f8569cb43]
^C
Better, I'm also getting segmentation faults sometime:
./runtime/druntime-test-runner-shared core.thread.fiber
/home/ldc/build/lib/libdruntime-ldc-unittest-shared.so.108(_D4core7runtime18runModuleUnitTestsUZ19unittestSegvHandlerUNbNiiPSQCm3sys5posix6signal9siginfo_tPvZv+0x24)[0x7f81537500]
linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0x7f8167d7a8]
/home/ldc/build/lib/libdruntime-ldc-unittest-shared.so.108(_D4core6thread5fiber5Fiber9switchOutMFNbNiZv+0x1c)[0x7f8154b458]
/home/ldc/build/lib/libdruntime-ldc-unittest-shared.so.108(_D4core6thread5fiber9TestFiber3runMFZv+0x5c)[0x7f8154c38c]
/home/ldc/build/lib/libdruntime-ldc-unittest-shared.so.108(fiber_entryPoint+0x68)[0x7f815483dc]
/home/ldc/build/lib/libdruntime-ldc-unittest-shared.so.108(+0x1dcb44)[0x7f815ccb44]
Segmentation fault (core dumped)
I have no idea how to approach fixing this one.
This is a subset of https://github.com/ldc-developers/ldc/blob/e170ca51e673c258437915a8aff3feb31ea6802b/.cirrus.yml#L56-L64 (been a while since checking if there's been any improvements), which is more up-to-date than https://github.com/ldc-developers/ldc/issues/2153#issuecomment-626028446 from the AArch64 tracker issue.
IIRC, the math issues boil down to 2 problems - one being slightly incomplete 128-bit quadruple-precision real support in upstream Phobos, and something special wrt. optimized code and NaNs on AArch64 (not preserving the NaN payload or something like that).
The sporadic core.thread.fiber failures with enabled optimizations happen on macOS arm64 too, contrary to the math issues (Apple uses 64-bit real, on AArch64 too).
Thanks for the new links, I should have looked a bit harder before opening the issues.
IIRC, the math issues boil down to 2 problems - one being slightly incomplete 128-bit quadruple-precision real support in upstream Phobos, and something special wrt. optimized code and NaNs on AArch64 (not preserving the NaN payload or something like that).
The std.math.exponential bug doesn't look like it's using NaNs, the failure is caused by that if statement not being skipped, even though the value of y is 5.1923e+33 which shouldn't be representable as a ulong. I don't know how floating point numbers works in assembly though, much more on aarch64, so I won't try to dissect this issue further.
One last thing, should CheckFiberMigration be set for all aarch64 systems, not just Apple since I was getting that warning or is it safe to ignore?
FYI it is this unittest in core.thread.fiber that is causing trouble on AArch64 (both macOS, and linux-musl): https://github.com/ldc-developers/ldc/blob/40ad5f7583fe90a85fd675bfc0cfc287d565c94a/runtime/druntime/src/core/thread/fiber.d#L2303-L2350
version(AArch64)
return;
fixes things for me. (both on macOS and linux musl)