OpenBLAS LAPACK test fails with `Error code from DDRGES3 = 9` on AMD Genoa

System:

$ lscpu | grep 'Model name:'
Model name:          AMD EPYC 9554 64-Core Processor
$ uname -a
Linux hpcl002 4.18.0-372.32.1.el8_6.x86_64 #1 SMP Fri Oct 7 12:35:10 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/cm/shared/uniol/sw/zen4/12.2/GCCcore/12.2.0/libexec/gcc/x86_64-pc-linux-gnu/12.2.0/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
Target: x86_64-pc-linux-gnu
Configured with: ../configure --enable-languages=c,c++,fortran --without-cuda-driver --enable-offload-targets=nvptx-none --enable-lto --enable-checking=release --disable-multilib --enable-shared=yes --enable-static=yes --enable-threads=posix --enable-plugins --enable-gold --enable-ld=default --prefix=/cm/shared/uniol/sw/zen4/12.2/GCCcore/12.2.0 --with-local-prefix=/cm/shared/uniol/sw/zen4/12.2/GCCcore/12.2.0 --enable-bootstrap --with-isl=/scratch/easybuild/build/GCCcore/12.2.0/system-system/gcc-12.2.0/stage2_stuff --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.2.0 (GCC)

Build and Test Commands: Building and testing OpenBLAS-0.3.23 (using Easybuild) with the following commands:

$ make -j 256 libs netlib shared  BINARY='64'  CC='gcc'  FC='gfortran'  MAKE_NB_JOBS='-1'  USE_OPENMP='1'  USE_THREAD='1'  CFLAGS='-O2 -ftree-vectorize -march=native -fno-math-errno'
$ make tests  BINARY='64'  CC='gcc'  FC='gfortran'  MAKE_NB_JOBS='-1'  USE_OPENMP='1'  USE_THREAD='1'
$ make lapack-test  BINARY='64'  CC='gcc'  FC='gfortran'  MAKE_NB_JOBS='-1'  USE_OPENMP='1'  USE_THREAD='1'

Test results: make tests completes without error, LAPACK tests return summary:

                        -->   LAPACK TESTING SUMMARY  <--
SUMMARY                 nb test run     numerical error         other error
================        ===========     =================       ================
REAL                    1328283         0       (0.000%)        0       (0.000%)
DOUBLE PRECISION        1327545         1       (0.000%)        1       (0.000%)
COMPLEX                 779587          171     (0.022%)        0       (0.000%)
COMPLEX16               780654          97      (0.012%)        0       (0.000%)

--> ALL PRECISIONS      4216069         269     (0.006%)        1       (0.000%)

I think the other error is coming from

DGS drivers:      1 out of   1555 tests failed to pass the threshold
 *** Error code from DDRGES3 =    9

All details are in testing_results.txt

Questions: How can I get this other error resolved? And should I worry about the 269 tests with numerical errors?

May 05 '23 10:05 HPC-UniOldenburg

Can you please re-test with the -ftree-vectorize in your CFLAGS replaced with its opposite, -fno-tree-vectorize (tree vectorizer is on by default in 12.2 and is known to cause this kind of problems). And/or try current develop branch - unfortunately I do not have such a big Ryzen system available to me at the moment, but there were some recent fixes (added #pragma "no-tree-vectorize") for GCC 11&12 over-optimizing some complex BLAS functions The "other" error is probably a failed iteration in one of the LAPACK routines called, so ultimately a numerical accuracy problem as well. (Also see https://github.com/Reference-LAPACK/lapack/issues/732 and linked issues, sadly parts of the testsuite appear to be too stringent to be useful in the context of optimized implementations or standard optimizations performed by modern compilers)

May 05 '23 11:05 martin-frbg

Using the newly released GCC 13 could also be an option - at least according to my first tests, it appears to have fixed the tree-optimizer bugs that had me put the pragmas in the known affected source files.

May 05 '23 11:05 martin-frbg

Thanks for the suggestions: tried the -fno-tree-vectorize first, unfortunately same result. Going to GCC 13 might be a good idea as it will also supports Zen4 better. So will try this next.

May 05 '23 13:05 HPC-UniOldenburg

Some progress: building with GCC 13.1.0 reduces the number of numerical errors to 26, all in the COMPLEX (4) or COMPLEX16 (21) tests. However, the error with code 9 in DDRGES3 remains. Will try development branch on Monday then.

May 05 '23 19:05 HPC-UniOldenburg

Also related : https://github.com/Reference-LAPACK/lapack/issues/744 and https://github.com/Reference-LAPACK/lapack/issues/475 (the latter was supposed to be fixed by https://github.com/Reference-LAPACK/lapack/pull/477 but this appears to be rather fragile code with a long history of odd and fleeting convergence problems) Note also how the reported result is always a dramatic 4.5E+15

May 07 '23 17:05 martin-frbg

Test summary from building with GCC 13.1 and -ftree-vectorize:

                        -->   LAPACK TESTING SUMMARY  <--
SUMMARY                 nb test run     numerical error         other error
================        ===========     =================       ================
REAL                    1328283         0       (0.000%)        0       (0.000%)
DOUBLE PRECISION        1327545         1       (0.000%)        1       (0.000%)
COMPLEX                 786943          4       (0.001%)        0       (0.000%)
COMPLEX16               786918          21      (0.003%)        0       (0.000%)

--> ALL PRECISIONS      4229689         26      (0.001%)        1       (0.000%)

There is no difference between -ftree-vectorize and -fno-tree-vectorize and also no difference between version 0.3.23 and development branch. The error code 9 is due to reordering failed in DTGSEN in line 550 of dgges3.f.

I also changed Makefile.x86_64:

$ diff -ru Makefile.x86_64.orig Makefile.x86_64
--- Makefile.x86_64.orig        2023-05-08 13:22:43.147444042 +0200
+++ Makefile.x86_64     2023-05-08 13:23:22.597020079 +0200
@@ -133,9 +133,9 @@
 ifeq ($(CORE), ZEN)
 ifdef HAVE_AVX512VL
 ifndef NO_AVX512
-CCOMMON_OPT += -march=skylake-avx512
+CCOMMON_OPT += -march=znver4
 ifneq ($(F_COMPILER), NAG)
-FCOMMON_OPT += -march=skylake-avx512
+FCOMMON_OPT += -march=znver4
 endif
 ifeq ($(OSNAME), CYGWIN_NT)
 CCOMMON_OPT += -fno-asynchronous-unwind-tables

but this also had no notable effect.

Surprisingly to me, changing the overall optimization from -O2 to -O1 changes the number of numerical errors to 76 in total (14 in REAL, 36 in DOUBLE, 4 in COMPLEX, and 22 in COMPLEX16) but no other error. Not sure if this helps.

May 08 '23 16:05 HPC-UniOldenburg

Yes, the -O1 effect is one of those counter-intuitive things where (probably) using fewer instructions means having fewer instances of rounding error on intermediate results. Using -znver4 will affect some instruction cost calculations but would need to be guarded with another gcc version check (and I think the performance gain should be pretty marginal). Lastly, you get much the same picture on a lowly zen3-based laptop so cpu model and core count does not play much of a role after all - ISTR these test failures crept up after algorithm changes in Reference-LAPACK 3.10 but they appear to be more of a nuisance than an actual defect.

May 08 '23 17:05 martin-frbg

Thanks, I will ignore the errors for now and will keep an eye on the future releases of GCC and OpenBLAS.

May 08 '23 18:05 HPC-UniOldenburg

We're seeing this same problem (1 failing test due to a non-numerical issue, "DDRGES: DGGES returned INFO= 9.") in different setting, including when:

Is there an easy way to selectively disable this particular test, to avoid blindly ignoring other failing tests which do signal a problem worth looking into?

Sep 28 '23 18:09 boegel

probably by editing lapack-netlib/TESTING/dgg.in to either disable all these tests or removing the parameter(s) of the offending one - have not confirmed this though. Interesting that it would happen with the generic build as well, where there is no FMA optimization beyond what the compiler does, and only some loop unrolling (assuming easybuild's "generic" corresponds to TARGET=GENERIC in OpenBLAS)

Sep 28 '23 19:09 martin-frbg

sorry, dgd.in not dgg - and specifically remove the "6" from the first list of matrix dimensions in line 3 of that file (6 eigenvalues + error code 3 => INFO=9)

Sep 28 '23 20:09 martin-frbg

@martin-frbg Seems like that worked like a charm, see https://github.com/easybuilders/easybuild-easyconfigs/pull/18887 + https://github.com/EESSI/software-layer/pull/334#issuecomment-1740493522 in which we're retrying the build of OpenBLAS 0.3.23 with the patch included.

Sep 29 '23 13:09 boegel

OpenBLAS OpenBLAS copied to clipboard

LAPACK test fails with `Error code from DDRGES3 = 9` on AMD Genoa

OpenBLAS
OpenBLAS copied to clipboard