tfel gcc-11: 10930 - crossed2deltachaboche_mtest fails in `spack install -v --test=root tfel %gcc@11.2.01`

I can reproduce a test fail of [email protected] - It is reproducible (each time) with

the standard gcc-11.2.0 of Ubuntu 21.10,
the gcc-11.1 of Arch Linux and
the gcc-11.1 from add-apt-repository ppa:ubuntu-toolchain-r/test on Ubuntu 18.04 on an AMD EPYC Rome CPU:

No test fail in an Ubuntu-18.04:

With gcc-8.4.0, gcc-9.4.0, gcc-10.3.0 (from ppa above, on the same AMD Server)
On an Intel CasadeLake, when using [email protected] built by spack with spack install -v spack install [email protected];spack load gcc;spack compiler find, the test passes: Test #10930: crossed2deltachaboche_mtest .. Passed 0.02 sec

Manual execution of the test case with verbose enabled (excerpt):

cd /tmp/$USER/spack-stage/spack-stage-tfel-4.0.0-pma6uz32atkqqv6jvl7jqaydph3snlaz/spack-build-pma6uz3/mtest/src/
./mtest --xml-output=true --result-file-output=true --verbose=level1 ../../mfront/tests/behaviours/crossed2delta/crossed2deltachaboche.mtest
...
resolution from 1.2625 to 1.34875
iteration 1 : 0.000109113 12.1871 (0.0541281 -0.026879 -0.0268775 0.114553 0 0)
iteration 2 : 0.000108783 12.1502 (0.0541794 -0.0269047 -0.0269032 0.114662 0 0)
CrossedSecant acceleration convergence
iteration 3 : 0.000108371 12.1042 (0.0542435 -0.0269367 -0.0269352 0.114798 0 0)
Crossed2Delta acceleration convergence
iteration 4 : 9.06824e-06 1.01285 (0.0711146 -0.0353723 -0.0353708 0.150582 0 0)
CrossedSecant acceleration convergence
iteration 5 : 8.2335e-07 0.0919619 (0.0726553 -0.0361427 -0.036141 0.15385 0 0)
CrossedSecant acceleration convergence
iteration 6 : 1.74212e-07 0.0191781 (0.0728155 -0.0362218 -0.0362221 0.154177 0 0)
convergence, after 6 iterations, order 0.647358

resolution from 1.34875 to 1.435
iteration 1 : 0.000109324 12.2107 (0.0728747 -0.0362396 -0.0362398 0.154287 0 0)
iteration 2 : 0.000108934 12.1671 (0.0729261 -0.0362653 -0.0362655 0.154395 0 0)
CrossedSecant acceleration convergence
iteration 3 : 0.000108604 12.1303 (0.072988 -0.0362963 -0.0362964 0.154527 0 0)
Crossed2Delta acceleration convergence
iteration 4 : 1.05406e-05 1.17731 (0.093516 -0.046561 -0.0465597 0.198076 0 0)
CrossedSecant acceleration convergence
iteration 5 : 2.50852e-06 0.280183 (0.0958002 -0.0476918 -0.0477132 0.202764 0 0)
Crossed2Delta acceleration convergence
iteration 6 : 1.53076e-07 0.017153 (0.0959923 -0.0477984 -0.0477986 0.203319 0 0)
convergence, after 6 iterations, order 1.94806

Execution succeeded
-number of period:     13
-number of iterations: 128
-number of sub-steps:  0
Result of test 'unit behaviour test' of group 'MTest'            : FAILED
*                                                                : FAILED
* *ReferenceFileComparisonTest::check : comparison for variable ': FAILED
* ReferenceFileComparisonTest::check : comparison for variable 'E: SUCCESS
* ReferenceFileComparisonTest::check : comparison for variable 'S: SUCCESS
End of Test Suite                                                : FAILED

I did not modify any compiler flags, and will report the used CFLAGS. A reproducer (full steps) is - Create new container from Ubuntu 21.10 couldimg

apt update;apt upgrade
apt install bzip2 unzip make g++-11 gfortran-11 python3-dev
git clone [email protected]:thelfer/spack.git
bin/spack install -v --test=root tfel

Nov 02 '21 11:11 bernhardkaindl

@bernhardkaindl Thanks for reporting ! Just tested with the default TFEL flags in release mode in a Ubuntu 21.10 with the default gcc and all tests are ok. To be continued.

Nov 02 '21 11:11 thelfer

Launched "spack install -v --test=root tfel%[email protected]" on my Debian box. Let's see what happens.

Nov 02 '21 11:11 thelfer

Sample g++ call from the verbose build output (this is from Intel CascadeLake system using spack's [email protected]. which PASSed the test)

...spack/lib/spack/env/gcc/g++ -DASTER_ARCH=64 -DCASTEM_UNIX_TYPE=UNIX64 -DCYRANO_ARCH=64 -DHAVE_ABAQUS=1 -DHAVE_ANSYS=1 -DHAVE_ASTER=1 -DHAVE_CALCULIX=1 -DHAVE_COMSOL=1 -DHAVE_CYRANO=1 -DHAVE_DIANAFEA=1 -DHAVE_EUROPLEXUS=1 -DHAVE_FENV -DHAVE_FORTRAN=1 -DHAVE_LSDYNA=1 -DHAVE_PYTHON=1 -DLOCAL_CASTEM_HEADER=1 -DMFrontCastemBehaviours_EXPORTS -DTFEL_ARCH64 -DTFEL_PYTHON_BINDINGS -I...spack-build-rz6kppa/mfront/tests/behaviours/castem/include -I...spack-src/mfront/include -I...spack-src/include -I...spack-src/tfel-check/include -O3 -DNDEBUG -fPIC -DVERSION=""4.0.0"" -DTFEL_SVN_REVISION="""" -DTFEL_CMAKE_GENERATOR=""Unix Makefiles"" -DOPTIMISATION_FLAGS0=""-fvisibility-inlines-hidden -fvisibility=hidden -fno-fast-math -DTFEL_NO_RUNTIME_CHECK_BOUNDS -O2 -DNDEBUG "" -DOPTIMISATION_FLAGS=""-ftree-vectorize -march=native "" -DOPTIMISATION_FLAGS2=""-ffast-math "" -DCOMPILER_WARNINGS=""-Wdisabled-optimization -Wno-unused-macros -Wno-missing-declarations -Wno-sign-compare -Wno-switch-enum -Wsuggest-override -Wint-in-bool-context -Wregister -Wduplicated-branches -Wmisleading-indentation -Wduplicated-cond -Wnull-dereference -Wtautological-compare -Wshift-overflow -Wshift-negative-value -Wbool-compare -Wsizeof-array-argument -Wlogical-not-parentheses -Wswitch-bool -Wsequence-point -Wignored-qualifiers -Wvector-operation-performance -Wtrampolines -Wstrict-null-sentinel -Wold-style-cast -Wnoexcept -Wmissing-include-dirs -Wlogical-op -Winit-self -Wdouble-promotion -Wno-conversion -Wreorder -Wundef -Wunknown-pragmas -Wredundant-decls -Wpacked -Wno-deprecated-declarations -Wno-multichar -Wmissing-format-attribute -Wno-endif-labels -Wfloat-equal -Wreturn-type -Woverloaded-virtual -Wnon-virtual-dtor -Wctor-dtor-privacy -Wwrite-strings -Wcast-align -Wcast-qual -Wpointer-arith -Wshadow -W -Wall -Wno-conversion "" -DCOMPILER_FLAGS="""" -DCOMPILER_CXXFLAGS="" -DTFEL_HAVE_NORETURN_ATTRIBUTE"" -DCASTEM_CPPFLAGS=""-DLINUX64 -DUNIX64 -DTHREAD"" -DLINUX64 -DUNIX64 -DTHREAD -DLOCAL_CASTEM_HEADER=1 -DHAVE_CASTEM=1 -DTFEL_PYTHON_INCLUDES=""-I/home/kaindlb/dev/spack/opt/spack/linux-ubuntu18.04-cascadelake/gcc-11.2.0/python-3.8.12-d5meujhbzx7phve3oo2jfgocszafkjm3/include/python3.8"" -DTFEL_PYTHON_LIBRARY_PATH=""/home/kaindlb/dev/spack/opt/spack/linux-ubuntu18.04-cascadelake/gcc-11.2.0/python-3.8.12-d5meujhbzx7phve3oo2jfgocszafkjm3/lib"" -DTFEL_PYTHON_LIBRARY=""python3.8"" -DTFEL_PYTHON_LIBS=""-L/home/kaindlb/dev/spack/opt/spack/linux-ubuntu18.04-cascadelake/gcc-11.2.0/python-3.8.12-d5meujhbzx7phve3oo2jfgocszafkjm3/lib -lpython3.8"" -DMFRONT_COMPILING -std=gnu++17 -MD -MT mfront/tests/behaviours/castem/CMakeFiles/MFrontCastemBehaviours.dir/src/T91MartensiticSteel_b_ROUX2007-mfront.o -MF CMakeFiles/MFrontCastemBehaviours.dir/src/T91MartensiticSteel_b_ROUX2007-mfront.o.d -o CMakeFiles/MFrontCastemBehaviours.dir/src/T91MartensiticSteel_b_ROUX2007-mfront.o -c ...spack-build-rz6kppa/mfront/tests/behaviours/castem/src/T91MartensiticSteel_b_ROUX2007-mfront.cxx In file included from ...spack-src/include/TFEL/Math/Array/MutableFixedSizeArrayBase.ixx:17, from ...spack-src/include/TFEL/Math/Array/MutableFixedSizeArrayBase.hxx:53, from ...spack-src/include/TFEL/Math/Array/GenericFixedSizeArray.hxx:21, from ...spack-src/include/TFEL/Math/tensor.hxx:26, from ...spack-src/include/TFEL/Math/t2tot2.hxx:21, from ...spack-build-rz6kppa/mfront/tests/behaviours/generic/s:%s/

Nov 02 '21 12:11 bernhardkaindl

@thelfer - An Update

On an Intel CasadeLake, when using [email protected] built by spack with spack install -v spack install [email protected];spack load gcc;spack compiler find, the test passes: Test #10930: crossed2deltachaboche_mtest .. Passed 0.02 sec

Building now with the Ubuntu gcc-11.1.0 from the Ubuntu ppa - Could it be triggered by one of Ubuntu gcc patches (Hm, does Archlinux also use such -fstackprotector-strong(etc)-patches like Ubuntu)?

Nov 02 '21 12:11 bernhardkaindl

@thelfer I suspect the AMD EPYC Rome(Zen2) might play a role in this riddle. Running a few more builds: So far, the AMD Rome/Zen2 in the only Server producing fails of this test, now even with a [email protected] which was built and integrated using spack install -v spack install [email protected];spack load gcc;spack compiler find. (with same method, the test on the CascadeLake Server - see comment above).

Nov 02 '21 13:11 bernhardkaindl

In case this testcase uses/exercises numpy, it could have to do with a possible remaining gcc-11 issue.

It should be fixed with gcc-11.2 (which still fails when used on the AMD Rome/Zen2), but just in case I refer to it:

https://github.com/numpy/numpy/releases/tag/v1.21.3

If you want to compile your own version using gcc-11 you will need to use gcc-11.2+ to avoid problems.

A bit more info and a link can be found from the list of releases in the notes of the prevous 1.21.x releases.

Nov 02 '21 16:11 bernhardkaindl

Sample g++ call from the verbose build output (this is from Intel CascadeLake system using spack's [email protected]. which PASSed the test)

...spack/lib/spack/env/gcc/g++ -DASTER_ARCH=64 -DCASTEM_UNIX_TYPE=UNIX64 -DCYRANO_ARCH=64 -DHAVE_ABAQUS=1 -DHAVE_ANSYS=1 -DHAVE_ASTER=1 -DHAVE_CALCULIX=1 -DHAVE_COMSOL=1 -DHAVE_CYRANO=1 -DHAVE_DIANAFEA=1 -DHAVE_EUROPLEXUS=1 -DHAVE_FENV -DHAVE_FORTRAN=1 -DHAVE_LSDYNA=1 -DHAVE_PYTHON=1 -DLOCAL_CASTEM_HEADER=1 -DMFrontCastemBehaviours_EXPORTS -DTFEL_ARCH64 -DTFEL_PYTHON_BINDINGS -I...spack-build-rz6kppa/mfront/tests/behaviours/castem/include -I...spack-src/mfront/include -I...spack-src/include -I...spack-src/tfel-check/include -O3 -DNDEBUG -fPIC -DVERSION=""4.0.0"" -DTFEL_SVN_REVISION="""" -DTFEL_CMAKE_GENERATOR=""Unix Makefiles"" -DOPTIMISATION_FLAGS0=""-fvisibility-inlines-hidden -fvisibility=hidden -fno-fast-math -DTFEL_NO_RUNTIME_CHECK_BOUNDS -O2 -DNDEBUG "" -DOPTIMISATION_FLAGS=""-ftree-vectorize -march=native "" -DOPTIMISATION_FLAGS2=""-ffast-math "" -DCOMPILER_WARNINGS=""-Wdisabled-optimization -Wno-unused-macros -Wno-missing-declarations -Wno-sign-compare -Wno-switch-enum -Wsuggest-override -Wint-in-bool-context -Wregister -Wduplicated-branches -Wmisleading-indentation -Wduplicated-cond -Wnull-dereference -Wtautological-compare -Wshift-overflow -Wshift-negative-value -Wbool-compare -Wsizeof-array-argument -Wlogical-not-parentheses -Wswitch-bool -Wsequence-point -Wignored-qualifiers -Wvector-operation-performance -Wtrampolines -Wstrict-null-sentinel -Wold-style-cast -Wnoexcept -Wmissing-include-dirs -Wlogical-op -Winit-self -Wdouble-promotion -Wno-conversion -Wreorder -Wundef -Wunknown-pragmas -Wredundant-decls -Wpacked -Wno-deprecated-declarations -Wno-multichar -Wmissing-format-attribute -Wno-endif-labels -Wfloat-equal -Wreturn-type -Woverloaded-virtual -Wnon-virtual-dtor -Wctor-dtor-privacy -Wwrite-strings -Wcast-align -Wcast-qual -Wpointer-arith -Wshadow -W -Wall -Wno-conversion "" -DCOMPILER_FLAGS="""" -DCOMPILER_CXXFLAGS="" -DTFEL_HAVE_NORETURN_ATTRIBUTE"" -DCASTEM_CPPFLAGS=""-DLINUX64 -DUNIX64 -DTHREAD"" -DLINUX64 -DUNIX64 -DTHREAD -DLOCAL_CASTEM_HEADER=1 -DHAVE_CASTEM=1 -DTFEL_PYTHON_INCLUDES=""-I/home/kaindlb/dev/spack/opt/spack/linux-ubuntu18.04-cascadelake/gcc-11.2.0/python-3.8.12-d5meujhbzx7phve3oo2jfgocszafkjm3/include/python3.8"" -DTFEL_PYTHON_LIBRARY_PATH=""/home/kaindlb/dev/spack/opt/spack/linux-ubuntu18.04-cascadelake/gcc-11.2.0/python-3.8.12-d5meujhbzx7phve3oo2jfgocszafkjm3/lib"" -DTFEL_PYTHON_LIBRARY=""python3.8"" -DTFEL_PYTHON_LIBS=""-L/home/kaindlb/dev/spack/opt/spack/linux-ubuntu18.04-cascadelake/gcc-11.2.0/python-3.8.12-d5meujhbzx7phve3oo2jfgocszafkjm3/lib -lpython3.8"" -DMFRONT_COMPILING -std=gnu++17 -MD -MT mfront/tests/behaviours/castem/CMakeFiles/MFrontCastemBehaviours.dir/src/T91MartensiticSteel_b_ROUX2007-mfront.o -MF CMakeFiles/MFrontCastemBehaviours.dir/src/T91MartensiticSteel_b_ROUX2007-mfront.o.d -o CMakeFiles/MFrontCastemBehaviours.dir/src/T91MartensiticSteel_b_ROUX2007-mfront.o -c ...spack-build-rz6kppa/mfront/tests/behaviours/castem/src/T91MartensiticSteel_b_ROUX2007-mfront.cxx In file included from ...spack-src/include/TFEL/Math/Array/MutableFixedSizeArrayBase.ixx:17, from ...spack-src/include/TFEL/Math/Array/MutableFixedSizeArrayBase.hxx:53, from ...spack-src/include/TFEL/Math/Array/GenericFixedSizeArray.hxx:21, from ...spack-src/include/TFEL/Math/tensor.hxx:26, from ...spack-src/include/TFEL/Math/t2tot2.hxx:21, from ...spack-build-rz6kppa/mfront/tests/behaviours/generic/s:%s/

The flags used by spack seems to be -O3 -DNDEBUG.

I rarely use -O3 as in some previous experiments of mine, -O2 -ffast-math was most efficient. But this was a long time ago. Maybe shall I reconsider this choice.

Nov 02 '21 22:11 thelfer

Argh. Just compiled TFEL with -03 -DNDEBUG on my Debian box with gcc-11.2.0 from spack and no problem at all ! I did not use spack install -v --test=root tfel mgis though. I'll check this tomorrow.

Nov 02 '21 23:11 thelfer

The flags used by spack seems to be -O3 -DNDEBUG

@thelfer Actually I don't know where that comes from, maybe from CMake. it does not appear to come from spack builld-env tfel .... Also the -O3 (maybe from some CMAKE_BUILD_TYPE or so) does not become the effective gcc flag in this build as the block of args containing -DTFEL_NO_RUNTIME_CHECK_BOUNDS -O2 -DNDEBUG (which includes -O2) appears after it and this overrides the -O3. (BTW, often -O3 isn't better than -O2, you'll know from benchmarks and as you write it wasn't when you last experimented with it, so -O2 is actually good to be there and makes the build also shorter than with -O3).

-ffast-math is also there and it appears after -fno-fast-math. Its a bit messy to have all these options passed to the compiler, I don't know where they come from, possibly they are injected using find_package() from cmake.

The flags -ftree-vectorize -march=native which you can see above are added by the build system of tfel, and not by something else. This is why filtering it from the tfel's cmake files is effective in removing it and letting spack's cc -march/-mtune flags in wrapper become active (but you can only "see" those when you do "ps aux|grep march=|tee ps.log" *and search for it in the ps.log). Have a look at the build yourself, it should help you to understand better.

On the failing test case: Yes, this appears to be triggered by the combination of gcc-11 and the AMD EPYC Rome CPU.

Nov 03 '21 11:11 bernhardkaindl

@bernhardkaindl

@thelfer Actually I don't know where that comes from, maybe from CMake. it does not appear to come from spack builld-env tfel ...

You're probaly right. I may be the default from cmake

-ffast-math is also there and it appears after -fno-fast-math. Its a bit messy to have all these options passed to the compiler

We don't use both at the same time. The code in cmake/modules/gcc.cmake barely just checks if they are supported.

I must admit that the way TFEL handles compiler options is a bit confusing. It tries to autodetect the available options, in particular additional warnings. But default we discard the values of CXXFLAGS, CFLAGS, etc.. and cmake defaults overides them.

We also keep the selected options to compile the MFront files (after TFEL has been installed).

But in pratice, my checks shows that this is not the case when building spack packages.

On the failing test case: Yes, this appears to be triggered by the combination of gcc-11 and the AMD EPYC Rome CPU

So I would not be too picky about it. I will change the test criteria if I ever face this case.

often -O3 isn't better than -O2

Well, I did I lot of tests (thanks to you !) and the difference is quite amazing so far.

The combinaison -03 -march=native -ffast-math does however lead to about twenty test failures (and none with -03 -march=native, so adding -ffast-math triggers something, but -O2 -ffast-math is known to work).

Nov 03 '21 16:11 thelfer

-ffast-math is also there and it appears after -fno-fast-math. Its a bit messy to have all these options passed.. We don't use both at the same time. The code in cmake/modules/gcc.cmake barely just checks if they are supported.

I see, but with all these -DOPTIMISATION_FLAGS=""-ftree-vectorize -march=native "" -DOPTIMISATION_FLAGS2=... (and so on, and on, and on, and on, and on) passed to most files, while only tfel-config/src/tfel-config.cxx actually uses these defines, it's really hard to see which flags are actually used.

And insult to injury, the name of the cmake macro tfel_enable_cxx_compiler_flag() is grossly misleading:

  tfel_enable_cxx_compiler_flag(OPTIMISATION_FLAGS  "fno-fast-math")
  tfel_enable_cxx_compiler_flag(OPTIMISATION_FLAGS2 "ffast-math")

Nov 03 '21 18:11 bernhardkaindl

@bernhardkaindl I do agree. It should definitively be better. It is just far from the top priority list :)

Nov 03 '21 18:11 thelfer

@bernhardkaindl Just changed tfel_enable_cxx_compiler_flag to tfel_add_cxx_compiler_flag_if_available in the master branch.

Nov 03 '21 20:11 thelfer

For information, I made a bunch of tests on compiler flags on a subset of the unit tests (around 11 000 tests) for version 4.0 and gcc-11.2.0:

03+march=native seems better than all the other configurations, in particular 02+march=native.
adding ftree_vectorize seems to increase compilation times without gain. Even worse, it seems to have negative impact

Each configuration has been tested 2 or 3 times which is not enough to be conclusive. However, it seems worth investigating how I could get all the tests working with 03+march=native+ffast-math and maybe change the default compiler flags (although that has nothing to do with spack packaging)

Nov 03 '21 21:11 thelfer

Nice, BTW -ffash-math isn't a monolith in gcc and clang. From man gcc-11:

-ffast-math Sets the options -fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only, -fno-rounding-math, -fno-signaling-nans, -fcx-limited-range and -fexcess-precision=fast.

And -funsafe-math-optimizations itself comprises several sub-settings:

Allow optimizations for floating-point arithmetic that (a) assume that arguments and results are valid and (b) may violate IEEE or ANSI standards. When used at link time, it may include libraries or startup files that change the default FPU control word or other similar optimizations. ... Enables -fno-signed-zeros, -fno-trapping-math, -fassociative-math and -freciprocal-math.

This means that after -ffast-math, one could fine-tune it and back-out some of them by adding e.g. -fno-associative-math after it (just as an example).

Maybe the other components of -Ofast of man gcc-11 might be interesting for you too:

It turns on -ffast-math, -fallow-store-data-races and the Fortran-specific -fstack-arrays, unless -fmax-stack-var-size is specified, and -fno-protect-parens.

For clang, you can find it's sub-options here: https://clang.llvm.org/docs/UsersManual.html

Nov 03 '21 22:11 bernhardkaindl

gcc-11: 10930 - crossed2deltachaboche_mtest fails in `spack install -v --test=root tfel %[email protected]`