kokkos-kernels icon indicating copy to clipboard operation
kokkos-kernels copied to clipboard

openmp.gesv_{double,mrhs_double,complex_double,mrhs_complex_double) failures w/ GCC 10.2.0 & c++17, with blas tpl enabled

Open e10harvey opened this issue 2 years ago • 4 comments

Example of failure output with gcc 10.2.0 & c++17:

/path/to/kokkos-kernels/unit_test/blas/Test_Blas_gesv.hpp:121: Failure
Value of: true
Expected: test_flag
Which is: false

Similar failures are printed for the other 3 tests.

Reproducer

With kokkos@4477a25ebe12b655cd5da273eec4ab954fbf32d5 and kokkos-kernels@0f5c8cc57f366a902cf415b97898d2ed88de9d56:

source /etc/profile.d/modules.sh
module purge
module load cmake/3.19.3 gcc/10.2.0 openblas/0.3.13/gcc/10.2.0

$KOKKOSKERNELS_PATH/cm_generate_makefile.bash --with-devices=OpenMP,Serial --arch=SKX --compiler=$GCC_PATH/bin/g++ --cxxflags="-O3 -Wall -Wunused-parameter -Wshadow -pedantic -Werror -Wsign-compare -Wtype-limits -Wignored-qualifiers -Wempty-body -Wclobbered -Wuninitialized " --cxxstandard="17" --ldflags=""   --kokkos-path=$KOKKOS_PATH --kokkoskernels-path=$KOKKOSKERNELS_PATH --with-scalars='double,complex_double' --with-ordinals=int --with-offsets=int,size_t --with-layouts=LayoutLeft --with-tpls=blas --user-blas-path=$OPENBLAS_ROOT/lib --user-lapack-path=$OPENBLAS_ROOT/lib --user-blas-lib=blas --user-lapack-lib=lapack --extra-linker-flags=-lgfortran,-lm --with-options= --with-cuda-options=   --no-examples

e10harvey avatar Sep 12 '22 21:09 e10harvey

@vqd8a : Would you have time to take a look?

e10harvey avatar Sep 12 '22 21:09 e10harvey

@e10harvey: I will look at it.

vqd8a avatar Sep 12 '22 21:09 vqd8a

@e10harvey I think there are some issues with the current openblas/0.3.13/gcc/10.2.0 on blake when using c++17. I observe that the gesv tests only pass with OMP_NUM_THREADS <=4, otherwise the gesv tests fail.

When I tried these tests with OpenBLAS 0.3.13 and higher versions installed in my home directory on blake (gcc 10.2.0, c++17), these tests pass with any OMP_NUM_THREADS.

Can we request for newer versions of OpenBLAS on blake?

vqd8a avatar Sep 20 '22 15:09 vqd8a

Can we request for newer versions of OpenBLAS on blake?

Ok, I filed a ticket for this.

e10harvey avatar Sep 20 '22 18:09 e10harvey

@e10harvey for some reason despite PR #1562 the build is not using openblas 0.3.20 as seen in Jenkins output and reproducer instruction. This CI is currently failing for that reason.

lucbv avatar Oct 11 '22 22:10 lucbv

Note that I tested with the new openblas and the tests are passing with it, it's just not picked up by the script.

lucbv avatar Oct 11 '22 22:10 lucbv

Never mind, I found my issue...

lucbv avatar Oct 12 '22 02:10 lucbv

@e10harvey Can we close this issue since we had a newer version of OpenBLAS on blake and the tests passed?

vqd8a avatar Nov 22 '22 21:11 vqd8a

Yes, this can be closed.

e10harvey avatar Nov 22 '22 21:11 e10harvey

Thanks @e10harvey

vqd8a avatar Nov 22 '22 21:11 vqd8a