OpenBLAS
OpenBLAS copied to clipboard
LAPACK test fails with `Error code from DDRGES3 = 9` on AMD Genoa
System:
$ lscpu | grep 'Model name:'
Model name: AMD EPYC 9554 64-Core Processor
$ uname -a
Linux hpcl002 4.18.0-372.32.1.el8_6.x86_64 #1 SMP Fri Oct 7 12:35:10 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/cm/shared/uniol/sw/zen4/12.2/GCCcore/12.2.0/libexec/gcc/x86_64-pc-linux-gnu/12.2.0/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
Target: x86_64-pc-linux-gnu
Configured with: ../configure --enable-languages=c,c++,fortran --without-cuda-driver --enable-offload-targets=nvptx-none --enable-lto --enable-checking=release --disable-multilib --enable-shared=yes --enable-static=yes --enable-threads=posix --enable-plugins --enable-gold --enable-ld=default --prefix=/cm/shared/uniol/sw/zen4/12.2/GCCcore/12.2.0 --with-local-prefix=/cm/shared/uniol/sw/zen4/12.2/GCCcore/12.2.0 --enable-bootstrap --with-isl=/scratch/easybuild/build/GCCcore/12.2.0/system-system/gcc-12.2.0/stage2_stuff --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.2.0 (GCC)
Build and Test Commands: Building and testing OpenBLAS-0.3.23 (using Easybuild) with the following commands:
$ make -j 256 libs netlib shared BINARY='64' CC='gcc' FC='gfortran' MAKE_NB_JOBS='-1' USE_OPENMP='1' USE_THREAD='1' CFLAGS='-O2 -ftree-vectorize -march=native -fno-math-errno'
$ make tests BINARY='64' CC='gcc' FC='gfortran' MAKE_NB_JOBS='-1' USE_OPENMP='1' USE_THREAD='1'
$ make lapack-test BINARY='64' CC='gcc' FC='gfortran' MAKE_NB_JOBS='-1' USE_OPENMP='1' USE_THREAD='1'
Test results:
make tests
completes without error, LAPACK tests return summary:
--> LAPACK TESTING SUMMARY <--
SUMMARY nb test run numerical error other error
================ =========== ================= ================
REAL 1328283 0 (0.000%) 0 (0.000%)
DOUBLE PRECISION 1327545 1 (0.000%) 1 (0.000%)
COMPLEX 779587 171 (0.022%) 0 (0.000%)
COMPLEX16 780654 97 (0.012%) 0 (0.000%)
--> ALL PRECISIONS 4216069 269 (0.006%) 1 (0.000%)
I think the other error is coming from
DGS drivers: 1 out of 1555 tests failed to pass the threshold
*** Error code from DDRGES3 = 9
All details are in testing_results.txt
Questions: How can I get this other error resolved? And should I worry about the 269 tests with numerical errors?
Can you please re-test with the -ftree-vectorize
in your CFLAGS replaced with its opposite, -fno-tree-vectorize
(tree vectorizer is on by default in 12.2 and is known to cause this kind of problems).
And/or try current develop
branch - unfortunately I do not have such a big Ryzen system available to me at the moment, but there were some recent fixes (added #pragma "no-tree-vectorize") for GCC 11&12 over-optimizing some complex BLAS functions
The "other" error is probably a failed iteration in one of the LAPACK routines called, so ultimately a numerical accuracy problem as well. (Also see https://github.com/Reference-LAPACK/lapack/issues/732 and linked issues, sadly parts of the testsuite appear to be too stringent to be useful in the context of optimized implementations or standard optimizations performed by modern compilers)
Using the newly released GCC 13 could also be an option - at least according to my first tests, it appears to have fixed the tree-optimizer bugs that had me put the pragmas in the known affected source files.
Thanks for the suggestions: tried the -fno-tree-vectorize
first, unfortunately same result. Going to GCC 13 might be a good idea as it will also supports Zen4 better. So will try this next.
Some progress: building with GCC 13.1.0 reduces the number of numerical errors to 26, all in the COMPLEX (4) or COMPLEX16 (21) tests. However, the error with code 9 in DDRGES3 remains. Will try development branch on Monday then.
Also related : https://github.com/Reference-LAPACK/lapack/issues/744 and https://github.com/Reference-LAPACK/lapack/issues/475 (the latter was supposed to be fixed by https://github.com/Reference-LAPACK/lapack/pull/477 but this appears to be rather fragile code with a long history of odd and fleeting convergence problems) Note also how the reported result is always a dramatic 4.5E+15
Test summary from building with GCC 13.1 and -ftree-vectorize
:
--> LAPACK TESTING SUMMARY <--
SUMMARY nb test run numerical error other error
================ =========== ================= ================
REAL 1328283 0 (0.000%) 0 (0.000%)
DOUBLE PRECISION 1327545 1 (0.000%) 1 (0.000%)
COMPLEX 786943 4 (0.001%) 0 (0.000%)
COMPLEX16 786918 21 (0.003%) 0 (0.000%)
--> ALL PRECISIONS 4229689 26 (0.001%) 1 (0.000%)
There is no difference between -ftree-vectorize
and -fno-tree-vectorize
and also no difference between version 0.3.23 and development branch. The error code 9 is due to reordering failed in DTGSEN
in line 550 of dgges3.f
.
I also changed Makefile.x86_64
:
$ diff -ru Makefile.x86_64.orig Makefile.x86_64
--- Makefile.x86_64.orig 2023-05-08 13:22:43.147444042 +0200
+++ Makefile.x86_64 2023-05-08 13:23:22.597020079 +0200
@@ -133,9 +133,9 @@
ifeq ($(CORE), ZEN)
ifdef HAVE_AVX512VL
ifndef NO_AVX512
-CCOMMON_OPT += -march=skylake-avx512
+CCOMMON_OPT += -march=znver4
ifneq ($(F_COMPILER), NAG)
-FCOMMON_OPT += -march=skylake-avx512
+FCOMMON_OPT += -march=znver4
endif
ifeq ($(OSNAME), CYGWIN_NT)
CCOMMON_OPT += -fno-asynchronous-unwind-tables
but this also had no notable effect.
Surprisingly to me, changing the overall optimization from -O2
to -O1
changes the number of numerical errors to 76 in total (14 in REAL, 36 in DOUBLE, 4 in COMPLEX, and 22 in COMPLEX16) but no other error. Not sure if this helps.
Yes, the -O1
effect is one of those counter-intuitive things where (probably) using fewer instructions means having fewer instances of rounding error on intermediate results. Using -znver4
will affect some instruction cost calculations but would need to be guarded with another gcc version check (and I think the performance gain should be pretty marginal). Lastly, you get much the same picture on a lowly zen3-based laptop so cpu model and core count does not play much of a role after all - ISTR these test failures crept up after algorithm changes in Reference-LAPACK 3.10 but they appear to be more of a nuisance than an actual defect.
Thanks, I will ignore the errors for now and will keep an eye on the future releases of GCC and OpenBLAS.
We're seeing this same problem (1 failing test due to a non-numerical issue, "DDRGES: DGGES returned INFO= 9.
") in different setting, including when:
-
building and testing OpenBLAS 0.3.23 for the
x86_64/generic
target in EESSI 2023.06 - building and testing OpenBLAS 0.3.24 on POWER9
Is there an easy way to selectively disable this particular test, to avoid blindly ignoring other failing tests which do signal a problem worth looking into?
probably by editing lapack-netlib/TESTING/dgg.in to either disable all these tests or removing the parameter(s) of the offending one - have not confirmed this though. Interesting that it would happen with the generic build as well, where there is no FMA optimization beyond what the compiler does, and only some loop unrolling (assuming easybuild's "generic" corresponds to TARGET=GENERIC in OpenBLAS)
sorry, dgd.in not dgg - and specifically remove the "6" from the first list of matrix dimensions in line 3 of that file (6 eigenvalues + error code 3 => INFO=9)
@martin-frbg Seems like that worked like a charm, see https://github.com/easybuilders/easybuild-easyconfigs/pull/18887 + https://github.com/EESSI/software-layer/pull/334#issuecomment-1740493522 in which we're retrying the build of OpenBLAS 0.3.23 with the patch included.