OpenBLAS
OpenBLAS copied to clipboard
Building on ORNL Summit (POWER9) with PGI compiler
After toying around with Makefile.power and Makefile.system for a while, I've successfully built OpenBLAS 0.3.10 on POWER9 at Summit (ORNL) with GCC 6.4.0 (the default GCC version, at the time of writing). No luck with PGI 20.1 or IBM XL 16.1.1-5 yet, though. Both fail with the compiler cannot finding stdatomic.h.
PGI:
"../common.h", line 696: catastrophic error: cannot open source file
"stdatomic.h"
#include <stdatomic.h>
XL:
../common.h:696:10: fatal error: 'stdatomic.h' file not found
#include <stdatomic.h>
^~~~~~~~~~~~~
I wonder if this header file is part of GCC, or GlibC?
Side note:
It's interesting though that googling "PGI OpenBLAS" brings out this forum post somewhere in the first page for me, where PGI says they ship OpenBLAS with the compiler:
https://forums.developer.nvidia.com/t/auto-type-undefined-when-compiling-openblas/141268
Sure enough, in the include directory of the PGI compiler, there is a file called openblas_config.h. With PGI version 20,1, this file says
#define OPENBLAS_VERSION " OpenBLAS 0.3.7 "
Also, how do I limit the number of compile processes invoked by make to 8, instead of the auto-detected 128? I don't want to slow the cluster down for everybody... I've tried make -j 8 but it still spawns loads of processes.
Edit Makefile.rule or pass MAKE_NB_JOBS=8 to make Not sure if PGI ship OpenBLAS with the ppc version of their compiler - probably only x86. stdatomic.h should be available if the compiler claims C11 capability I think ? The include line in common.h is guarded with an ifdef and I do not remember seeing this problem before
Well, looks as if old versions of the "legacy" (pre-clang) xlc actually did not provide stdatomic.h, which is a bit unfortunate as there are about ten files in the code where we check for a new enough STDC_VERSION as the sole indicator that atomic operations are supported.
After toying around with
Makefile.powerandMakefile.systemfor a while, I've successfully built OpenBLAS 0.3.10 on POWER9 at Summit (ORNL) with GCC 6.4.0 (the default GCC version, at the time of writing). No luck with PGI 20.1 or IBM XL 16.1.1-5 yet, though. Both fail with the compiler cannot finding stdatomic.h. PGI:"../common.h", line 696: catastrophic error: cannot open source file "stdatomic.h" #include <stdatomic.h>XL:
../common.h:696:10: fatal error: 'stdatomic.h' file not found #include <stdatomic.h> ^~~~~~~~~~~~~I wonder if this header file is part of GCC, or GlibC?
Side note: It's interesting though that googling "PGI OpenBLAS" brings out this forum post somewhere in the first page for me, where PGI says they ship OpenBLAS with the compiler: https://forums.developer.nvidia.com/t/auto-type-undefined-when-compiling-openblas/141268 Sure enough, in the
includedirectory of the PGI compiler, there is a file calledopenblas_config.h. With PGI version 20,1, this file says#define OPENBLAS_VERSION " OpenBLAS 0.3.7 "
We do ship OpenBLAS with the PGI compilers on Power, though this build is done using a hybrid of GCC (for the C/ASM source files) and PGI (for the Fortran files). This is necessary, due to the presence of several unsupported features used in the C files, which the PGI C compiler is not able to compile. (e.g. the vector keyword present in some of the Power inline assembly files.)
I can make some patches available if you are interested in building it yourself this way.
Sure, that would be great!
The reason I want to build OpenBLAS on Summit is to use it with MAGMA package. Currently I use MAGMA linked against IBM ESSL (which depends on IBM XL runtime libraries) for the CPU BLAS/LAPACK routines. I noticed that starting with PGI 20.1, whenever there is another OpenMP library, PGI will emit an error message at runtime about NV_OMP_DISABLE_WARNINGS. So, I want to build MAGMA linked against OpenBLAS, but that means I need to be able to build OpenBLAS on Summit using PGI first...
I've tried building MAGMA with the OpenBLAS version that is bundled with PGI 20.1, but it gives some accuracy issues, so I want to test building it using either OpenBLAS 0.3.10 release version or the development version. For details, here's a link to the MAGMA user forum. https://icl.cs.utk.edu/magma/forum/viewtopic.php?f=2&t=4181&p=8624#p8624
Edit: add MAGMA forum link
@cparrott73 Thanks for the patch!
With PGI 20.1 on Summit (ORNL), I had to also propagate the changes to the POWER9 part. I also tried "translating" the CCOMMON_OPT part to PGI equivalents (e.g. -Ofast to -fast) so the rest of the C files can be compiled with pgcc directly:
ifeq ($(CORE), POWER9)
ifeq ($(USE_OPENMP), 1)
ifneq ($(C_COMPILER), PGI)
CCOMMON_OPT += -Ofast -mcpu=power9 -mtune=power9 -mvsx -malign-power -DUSE_OPENMP -fno-fast-math -fopenmp
else
CCOMMON_OPT += -fast -Mvect=simd -Mcache_align -DUSE_OPENMP -mp
endif
ifneq ($(F_COMPILER), PGI)
FCOMMON_OPT += -O2 -frecursive -mcpu=power9 -mtune=power9 -malign-power -DUSE_OPENMP -fno-fast-math -fopenmp
else
FCOMMON_OPT += -O2 -Mrecursive -DUSE_OPENMP -mp
endif
else
ifneq ($(C_COMPILER), PGI)
CCOMMON_OPT += -Ofast -mcpu=power9 -mtune=power9 -mvsx -malign-power -fno-fast-math
else
CCOMMON_OPT += -fast -Mvect=simd -Mcache_align
endif
ifneq ($(F_COMPILER), PGI)
FCOMMON_OPT += -O2 -frecursive -mcpu=power9 -mtune=power9 -malign-power -fno-fast-math
else
FCOMMON_OPT += -O2 -Mrecursive
endif
endif
endif
The -fopenmp flag is not yet aliased for PGI 20.1 (the latest available version on Summit, at the time of writing), so I replaced it with -mp as suggested. I'm unsure about -mvsx to Mvect=simd, as well as -malign-power to -Mcache_align...
Now I'm stuck at manually recompiling driver/others/blas_server_omp.c. The error message is again due to missing stdatomic.h. When I "force" include GCC 6.4.0's version, I reproduce the identifier "__auto_type" is undefined error mentioned in the NVIDIA developer forum post I referred to earlier.
Here are the commands I used so far:
module load gcc/6.4.0
make MAKE_NB_JOBS=8 CC=gcc FC=pgfortran
# stops compiling at the first Fortran source file with error pgfortran not found
module load pgi/20.1
make MAKE_NB_JOBS=8 CC=pgcc FC=pgfortran
# stops compiling with link error from blas_server_omp.c, undefined reference to GOMP_parallel
cd driver/others
pgcc -mp -O2 -DMAX_STACK_ALLOC=2048 -tp pwr9 -DF_INTERFACE_PGI -fPIC -DSMP_SERVER -DUSE_OPENMP -DNO_WARMUP -DMAX_CPU_NUMBER=128 -DMAX_PARALLEL_NUMBER=1 -DVERSION=\"0.3.10\" -fast -Mvect=simd -Mcache_align -DUSE_OPENMP -UASMNAME -UASMFNAME -UNAME -UCNAME -UCHAR_NAME -UCHAR_CNAME -DASMNAME=blas_server -DASMFNAME=blas_server_ -DNAME=blas_server_ -DCNAME=blas_server -DCHAR_NAME=\"blas_server_\" -DCHAR_CNAME=\"blas_server\" -DNO_AFFINITY -I../.. -I/sw/summit/gcc/6.4.0/lib/gcc/powerpc64le-none-linux-gnu/6.4.0/include -c blas_server_omp.c -o blas_server.o
The full error message from the last command can be found here.
Edit: The system GCC on Summit is 4.8.5; could this be the reason it's missing stdatomic.h -- simply because it's too old?
Yes could be that 4.8.5 is too old to have stdatomic.h - but unlike xlc and pgcc it does not claim C11 compatibility. Basically another patch is needed that adds a not IBM or PGI to all checks for __STDC_VERSION >=2011
that linker error about gomp_parallel is probably caused by f_check suppressing certain -lomp linker options when the PGI compilers are used. This seemed to be necessary on x86_64 but may have been papering over some real problem elsewhere
~Btw that is basically what cparrott added to the other ticket.~ Actually the -lomp suppression should occur on anything not PGI if I remember correctly, perhaps the problem only comes from trying to do a partial rebuild (which our convoluted makefiles may not handle too well) Unfortunately i only have the community release of the compiler which is still 19.10 AFAIK
@martin-frbg Summit also has PGI 19.10 installed. Should I wait for #2725 to be merged first before trying to build again?
Yes, or get it in patch format by adding .diff to its github url
merged now (note I have not gotten around to merging cparrott73's patch yet but plan to do it tomorrow)
With PGI 19.10, after applying the diff patch for #2725, the build strangely didn't choke on blas_server_omp.c. Instead, it went further and finally failed at make shared phase in the second stage (the one using pgcc and pgfortran) with the following error message:
pgcc -O2 -DMAX_STACK_ALLOC=2048 -DUSE_LOCKING -tp pwr9 -DF_INTERFACE_PGI -fPIC -DNO_WARMUP -DMAX_CPU_NUMBER=128 -DMAX_PARALLEL_NUMBER=1 -DVERSION=\"0.3.10\" -fast -Mvect=simd -Mcache_align -UASMNAME -UASMFNAME -UNAME -UCNAME -UCHAR_NAME -UCHAR_CNAME -DASMNAME= -DASMFNAME=_ -DNAME=_ -DCNAME= -DCHAR_NAME=\"_\" -DCHAR_CNAME=\"\" -DNO_AFFINITY -I.. -w -o linktest linktest.c ../libopenblas_power9-r0.3.10.so -L/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.10-vke4btwz6u5p7gix22s2v2qj4352ctll/linuxpower/19.10/lib -L/usr/lib64 -L/usr/lib/gcc/ppc64le-redhat-linux/4.8.5 -Wl,-rpath,/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.10-vke4btwz6u5p7gix22s2v2qj4352ctll/linuxpower/19.10/lib -L/usr/lib/gcc/ppc64le-redhat-linux/4.8.5/../../../../lib64 -lpgftnrtl -latomic -ldl -lpthread -lpgmath -lpgc -lrt -lpthread -lm -lc && echo OK.
linktest.c:
../libopenblas_power9-r0.3.10.so: undefined reference to `__get_size_of'
../libopenblas_power9-r0.3.10.so: undefined reference to `pgf90_str_copy_klen'
../libopenblas_power9-r0.3.10.so: undefined reference to `pgf90_str_cpy1'
../libopenblas_power9-r0.3.10.so: undefined reference to `pgf90_strcmp_klen'
../libopenblas_power9-r0.3.10.so: undefined reference to `pghpf_maxloc_i8'
../libopenblas_power9-r0.3.10.so: undefined reference to `pgf90_set_intrin_type_i8'
/usr/bin/ld: link errors found, deleting executable `linktest'
make[1]: *** [../libopenblas_power9-r0.3.10.so] Error 2
make[1]: Leaving directory `/gpfs/alpine/******/scratch/wyphan/OpenBLAS-0.3.10/exports'
make: *** [shared] Error 2
(Sorry I had to censor the allocation ID on Summit.)
I know this error can simply be fixed by adding -pgf90libs to the link line. Where should I add that (specifically, which file)?
Edit: clarified wording Edit 2: found it, see patch below
With PGI 19.10, after applying the diff patch for #2725, the build ended at the second stage (the one using
pgccandpgfortran) with the following error message:pgcc -O2 -DMAX_STACK_ALLOC=2048 -DUSE_LOCKING -tp pwr9 -DF_INTERFACE_PGI -fPIC -DNO_WARMUP -DMAX_CPU_NUMBER=128 -DMAX_PARALLEL_NUMBER=1 -DVERSION=\"0.3.10\" -fast -Mvect=simd -Mcache_align -UASMNAME -UASMFNAME -UNAME -UCNAME -UCHAR_NAME -UCHAR_CNAME -DASMNAME= -DASMFNAME=_ -DNAME=_ -DCNAME= -DCHAR_NAME=\"_\" -DCHAR_CNAME=\"\" -DNO_AFFINITY -I.. -w -o linktest linktest.c ../libopenblas_power9-r0.3.10.so -L/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.10-vke4btwz6u5p7gix22s2v2qj4352ctll/linuxpower/19.10/lib -L/usr/lib64 -L/usr/lib/gcc/ppc64le-redhat-linux/4.8.5 -Wl,-rpath,/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.10-vke4btwz6u5p7gix22s2v2qj4352ctll/linuxpower/19.10/lib -L/usr/lib/gcc/ppc64le-redhat-linux/4.8.5/../../../../lib64 -lpgftnrtl -latomic -ldl -lpthread -lpgmath -lpgc -lrt -lpthread -lm -lc && echo OK. linktest.c: ../libopenblas_power9-r0.3.10.so: undefined reference to `__get_size_of' ../libopenblas_power9-r0.3.10.so: undefined reference to `pgf90_str_copy_klen' ../libopenblas_power9-r0.3.10.so: undefined reference to `pgf90_str_cpy1' ../libopenblas_power9-r0.3.10.so: undefined reference to `pgf90_strcmp_klen' ../libopenblas_power9-r0.3.10.so: undefined reference to `pghpf_maxloc_i8' ../libopenblas_power9-r0.3.10.so: undefined reference to `pgf90_set_intrin_type_i8' /usr/bin/ld: link errors found, deleting executable `linktest' make[1]: *** [../libopenblas_power9-r0.3.10.so] Error 2 make[1]: Leaving directory `/gpfs/alpine/******/scratch/wyphan/OpenBLAS-0.3.10/exports' make: *** [shared] Error 2(Sorry I had to censor the allocation ID on Summit.)
I know this error can simply be fixed by adding
-pgf90libsto the link line. Where should I add that?
You can add -pgf90libs anywhere on the link line. That should resolve the missing references.
Found it. I added the following lines to exports/Makefile:
ifeq ($(C_COMPILER), PGI)
EXTRALIB += -pgf90libs
endif
The build completes with PGI 19.10, and OpenMP disabled!
I know what's causing the issue with the stdatomic.h header file. This is a GCC-specific header file, but the PGI compilers fake enough compatibility with GCC that they fool the guards in common.h and also blas_server_omp.c.
If you change definitions like this:
#if (__STDC_VERSION__ >= 201112L)
To:
#if (__STDC_VERSION__ >= 201112L) && defined(C_GCC)
That should work around most of the problems with stdatomic.h and the associated functions it declares prototypes for.
Interestingly, we do implement the same atomic functions in libnvhpcatm, but we don't provide a compatibility header file for them. I'll have to check with our developers and see whether it was just an oversight, or if there was a particular reason for that.
I'm aware of the compilation issues with the LLVM backend, necessitating the -Mnollvm workaround. Unfortunately, the NoLLVM compilers are going away soon. I need to escalate this with our developers, and make sure they fix this one.
@cparrott73 What about the OpenMP-enabled version? Do you have any suggestions on the flags I need to use when manually recompiling blas_server_omp.c? Also, if I recall correctly, the PGI compilers on POWER doesn't come with the NoLLVM version anymore, right?
@martin-frbg The successful build was using USE_THREADS=0 and USE_LOCKING=1 flags. This is thread-safe, right? Can I call it from my code (which has both MPI and OpenMP), or do I still need to build again with USE_OPENMP=1 flag?
Edit: add question about NoLLVM on POWER
@wyphan that should indeed be thread-safe (and USE_OPENMP=1 is only required when USE_THREADS=1)
Just installed PGI Community Edition on an OpenPOWER VM to fully understand the stdatomic.h issue. (As luck would have it, I picked a fedora32 VM where 19.10 libpgmath complains about lack of __xxx_finite math functions so needed to add corresponding dummies to getarch.c and getarch_2nd.c to even get started)
Also there seems to be a problem with vector intrinsics support in 19.10 - several of the current microkernels try to #include altivec.h which is not present in the pgi toolchain but gets pulled in from the preinstalled gcc. (This leads to a confusing #warning message getting printed from that header, it advises to add -maltivec to the CFLAGS which pgcc does not know how to handle).
As at least the ZGEMVTKERNEL is shared between POWER8 and POWER9, I assume this is no longer an issue with the 20.x releases ?
@cparrott73 What about the OpenMP-enabled version? Do you have any suggestions on the flags I need to use when manually recompiling
blas_server_omp.c? Also, if I recall correctly, the PGI compilers on POWER doesn't come with the NoLLVM version anymore, right?
Here is the command line we have used:
pgcc -c99 -DUSE_OPENMP -mp -DMAX_STACK_ALLOC=2048 -mp -m64 -DF_INTERFACE_PGI -fPIC -DDYNAMIC_ARCH -DSMP_SERVER -DUSE_OPENMP -DNO_WARMUP -DMAX_CPU_NUMBER=512 -DMAX_PARALLEL_NUMBER=1 -DVERSION=\"0.3.7\" -DASMNAME=blas_server -DASMFNAME=blas_server_ -DNAME=blas_server_ -DCNAME=blas_server -DCHAR_NAME=\"blas_server_\" -DCHAR_CNAME=\"blas_server\" -DNO_AFFINITY -I../.. -tp pwr8 -c blas_server_omp.c -o blas_server.o
(This is from 0.3.7, but I believe it should work for 0.3.10 as well. Update the above accordingly.)
You should be able to use the ar tool to extract the old blas_server_omp.o from the libopenblas.a archive, and replace it with this one.
Power compilers have always been LLVM-based. We never ported the NoLLVM compilers to Power.
Also there seems to be a problem with vector intrinsics support in 19.10 - several of the current microkernels try to
#include altivec.h' which is not present in the pgi toolchain but gets pulled in from the preinstalled gcc. (This leads to a confusing#warningmessage getting printed from that header, it advises to add-maltivec` to the CFLAGS which pgcc does not know how to handle). As at least the ZGEMVTKERNEL is shared between POWER8 and POWER9, I assume this is no longer an issue with the 20.x releases ?
That's a good question. I think this is the source of the heartburn with the vector keyword that I observed previously. Let me run a quick check with both PGI 20.4 and the forthcoming NVIDIA HPC SDK 20.7 development compilers and see where we are at there.
Note that the NVIDIA HPC SDK is the successor the PGI - the PGI standalone compiler products have been discontinued as of 20.4. Things will change substantially in 20.7, more on this soon as we get closer to release.
Thanks. I am currently adding #ifdef __ALTIVEC__ where needed (thankfully in most cases that just means adding them around the #include for the respective microkernel file).
@martin-frbg - will that fix this issue?
pgcc -c -O2 -DMAX_STACK_ALLOC=2048 -mp -tp pwr8 -DF_INTERFACE_PGI -fPIC -DSMP_SERVER -DUSE_OPENMP -DNO_WARMUP -DMAX_CPU_NUMBER=512 -DMAX_PARALLEL_NUMBER=1 -DVERSION=\"0.3.10\" -fast -tp pwr8 -mp -UASMNAME -UASMFNAME -UNAME -UCNAME -UCHAR_NAME -UCHAR_CNAME -DASMNAME=sasum_k -DASMFNAME=sasum_k_ -DNAME=sasum_k_ -DCNAME=sasum_k -DCHAR_NAME=\"sasum_k_\" -DCHAR_CNAME=\"sasum_k\" -DNO_AFFINITY -I.. -UDOUBLE -UCOMPLEX -UCOMPLEX -UDOUBLE ../kernel/power/sasum.c -o sasum_k.o "../kernel/power/sasum_microk_power8.c", line 41: error: identifier "__vector" is undefined __vector float t0; ^
This is what I'm running into with most recent PGI/NV compilers on Power. I've been exploring trying to get our frontend to support the __vector keyword, but I have not had much luck to this point.
Yes it should, although by disabling these faster code paths (and in some cases, replacing the entire kernel with an #include of its generic C implementation). The `#if defined' here is stolen & adapted from the GCC altivec.h, not sure yet if it should be "both VEC and ALTIVEC defined" for good measure. kernel/power/sasum.c as an example:
#if defined(POWER8) || defined(POWER9) || defined(POWER10)
+#if defined(__VEC__) || defined(__ALTIVEC__)
#include "sasum_microk_power8.c"
+#endif
#endif
Just realized another "interesting" effect with 19.10 - since #2601, Makefile.system undefines CNAME, NAME etc. before defining them, to avoid nuisance warnings caused by recursive inclusion of this makefile in some circumstances. At least on the Fedora32 host, the -UNAME apparently takes precedence over the subsequent -DNAME=somesymbol so that the objects and library all end up with function entry points called "NAME" rather than sasum etc.
I assume this could not have gone unnoticed if you also encountered this, and Fedora32 is probably not a supported platform
for the older 19.10 (judging by the workarounds necessary to fix the missing __acos_finite & friends)
Interestingly this seems to be exactly what wyphan encountered in his Zen2 compilation attempt https://github.com/xianyi/OpenBLAS/issues/2386#issuecomment-660203420 so the problem could be genuine (and not addressed by anything in the patch you kindly provided)
Hmm, that explains why the build completes successfully on Zen2 for 0.3.9, but not for 0.3.10.
PR merged (which should also fix the Zen2 build) but with 19.1 I probably cannot test if the atomics code would work without including stdatomic.h (it reports STDC_VERSION 199901L). Could you post the output of pgcc -dM -E ctest.c from a 20.x version for me to see if it still defines STDC_NO_ATOMICS (which would seem to imply not understanding the keyword _Atomic either) ?
@martin-frbg Here it is, for PGI 20.1 on Summit.