grasp Properly support the Intel compiler with the CMake build

(This only affects the CMake build)

Currently, we always pass -fno-automatic as a compiler flag, even if the user adds their own flags (by setting CMAKE_Fortran_FLAGS). This is a problem for e.g. ifort which has a different name for that flag.

With this change, if the user decides to customize the flags by passing their own CMAKE_Fortran_FLAGS, we no longer set -fno-automatic automatically, which solves that problem. The only thing to note though is that the user then needs to explicitly pass -fno-automatic.

Question to anyone who might know this: do we actually need -fno-automatic for GRASP? It changes the way SAVE attributes are handled.. but is there any part in GRASP that actually requires this flag?

Fix #68

Jun 02 '21 05:06 mortenpi

I'm pretty sure that flag is (or at least was) needed, but don't remember on top of my head why.

Jun 02 '21 08:06 jongrumer

The -fno-automatic flag was needed because early FORTRAN codes always saved values when a routine was exited whereas F90 does not. I suspect the need is reduced but I am not sure it has been tested. Flags always depend on the compiler.

Jun 03 '21 03:06 cffischer

Alright, new approach (since checking whether the user has modified CMAKE_Fortran_FLAGS wasn't reliable):

We still automatically append -fno-automatic to CMAKE_Fortran_FLAGS if we detect that it's gfortran.
With ifort we append -save instead
Other compilers will print a warning and won't append anything automatically.
You can disable the automatic append completely by passing -DGRASP_DEFAULT_FLAGS=FALSE to cmake

@jongrumer could you check that this does the right thing in a live ifort environment?

Jun 06 '21 02:06 mortenpi

Ok - I know this should be in a separate PR, but to speed things up a bit - I added the mkdir fix we found in mpi90/sys_mkdir and also included the -mkl flag in the default ifortran flags in CMakeLists.txt to turn on MKL. With the new freely available ifort, now also including MPI and MKL (!), this is of course the way to do it if one is using ifort. Just make sure you install both the Base kit and the HPC kit (the former contains MKL and the latter includes the compiler and MPI). Just remove these two commits if you (@mortenpi) think this is completely out of line. Will be interesting to see if there are any speedups when running with just intel all the way. A quick test is given further below.

Intel Ifort + MPI/MKL (HPC) instructions for Linux: https://software.intel.com/content/www/us/en/develop/documentation/installation-guide-for-intel-oneapi-toolkits-linux/top/installation/install-using-package-managers/apt.html#apt_PACKAGES

Mac and Windows have to download executables (note that the Mac version does not seem to ship with MPI).

Compiling with Cmake, and using the new Intel Ifort API kits (Base + HPC), including the -mkl flag above via the addition to CMakeList.txt, I get the following linked libraries for e.g. rmcdhf-mpi. Seems sort of fine, but I'm not entirely sure why e.g. libgfortran.so.4 and openblas is still in there...needs further investigations.

ldd rmcdhf_mpi
	linux-vdso.so.1 (0x00007ffd2f969000)
	libmkl_intel_lp64.so.1 => /opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_intel_lp64.so.1 (0x0000151c141f3000)
	libmkl_intel_thread.so.1 => /opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_intel_thread.so.1 (0x0000151c108fe000)
	libmkl_core.so.1 => /opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_core.so.1 (0x0000151c07363000)
	libiomp5.so => /opt/intel/oneapi/compiler/2021.2.0/linux/compiler/lib/intel64_lin/libiomp5.so (0x0000151c06f4c000)
	libblas.so.3 => /usr/lib/x86_64-linux-gnu/libblas.so.3 (0x0000151c06cf1000)
	liblapack.so.3 => /usr/lib/x86_64-linux-gnu/liblapack.so.3 (0x0000151c0646b000)
	libmpifort.so.12 => /opt/intel/oneapi/mpi/2021.2.0//lib/libmpifort.so.12 (0x0000151c060ad000)
	libmpi.so.12 => /opt/intel/oneapi/mpi/2021.2.0//lib/release/libmpi.so.12 (0x0000151c04de7000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x0000151c04be3000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x0000151c049db000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x0000151c047bc000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x0000151c0441e000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x0000151c0402d000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x0000151c03e15000)
	/lib64/ld-linux-x86-64.so.2 (0x0000151c14f58000)
	libopenblas.so.0 => /usr/lib/x86_64-linux-gnu/libopenblas.so.0 (0x0000151c01b6f000)
	libgfortran.so.4 => /usr/lib/x86_64-linux-gnu/libgfortran.so.4 (0x0000151c01790000)
	libfabric.so.1 => /opt/intel/oneapi/mpi/2021.2.0//libfabric/lib/libfabric.so.1 (0x0000151c0154a000)
	libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x0000151c01303000)

And this is what it looks like for a gfortran/openMPI build (no surprises, GNU all the way)

ldd rmcdhf_mpi
	linux-vdso.so.1 (0x00007ffe37bd4000)
	libblas.so.3 => /usr/lib/x86_64-linux-gnu/libblas.so.3 (0x0000149863537000)
	liblapack.so.3 => /usr/lib/x86_64-linux-gnu/liblapack.so.3 (0x0000149862cb1000)
	libmpi_mpifh.so.20 => /usr/lib/x86_64-linux-gnu/libmpi_mpifh.so.20 (0x0000149862a5a000)
	libgfortran.so.4 => /usr/lib/x86_64-linux-gnu/libgfortran.so.4 (0x000014986267b000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00001498622dd000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00001498620c5000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x0000149861cd4000)
	libopenblas.so.0 => /usr/lib/x86_64-linux-gnu/libopenblas.so.0 (0x000014985fa2e000)
	libmpi.so.20 => /usr/lib/x86_64-linux-gnu/libmpi.so.20 (0x000014985f73c000)
	libopen-pal.so.20 => /usr/lib/x86_64-linux-gnu/libopen-pal.so.20 (0x000014985f48a000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x000014985f26b000)
	libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x000014985f024000)
	/lib64/ld-linux-x86-64.so.2 (0x0000149863a71000)
	libopen-rte.so.20 => /usr/lib/x86_64-linux-gnu/libopen-rte.so.20 (0x000014985ed9c000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x000014985eb94000)
	libhwloc.so.5 => /usr/lib/x86_64-linux-gnu/libhwloc.so.5 (0x000014985e957000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x000014985e753000)
	libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x000014985e550000)
	libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x000014985e345000)
	libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x000014985e13b000)

Quick test case -- IN PROGRESS! A simple RMCDHF_MPI + RCI_MPI + TRANSITIONS_MPI (8 processes) run on OI 2p4 with SDT excitations from 2p4 only, first layer (3s,3p,3d,4f,5g,6h) with a reduction in the RMCDHF run and full list in RCI, gives with the two setups above the following timings (time stamps are given when the individual program is started + total exec time at the end).

ifort (-O3 -save -mkl) + MPI + MKL and using Intel mpirun
---------------
      LAYER: as1
 NEW SHELLS: 3s,3p,3d,4f,5g,6h
  OPTIMIZED: 3s* 3p* 3d* 4f* 5g* 6h*

 == Tue Jun  8 15:55:32 CEST 2021 == rcsfgenerate
 == Tue Jun  8 15:55:33 CEST 2021 == rangular
 == Tue Jun  8 15:55:33 CEST 2021 == rwfnestimate
 == Tue Jun  8 15:55:33 CEST 2021 == rmcdhf (Iteration number  11)
 == Tue Jun  8 15:55:35 CEST 2021 == rci
 == Tue Jun  8 15:55:46 CEST 2021 == jj2lsj
 == Tue Jun  8 15:55:47 CEST 2021 == rtransition
 == Tue Jun  8 15:55:56 CEST 2021 == done
 
Total Execution time - 0 hours 0 min 25 sec
 
gfortran-9 (-O3 -fno-automatic) + OpenMPI and using GNU mpirun
---------------------
      LAYER: as1
 NEW SHELLS: 3s,3p,3d,4f,5g,6h
  OPTIMIZED: 3s* 3p* 3d* 4f* 5g* 6h*

 == Tue Jun  8 15:50:11 CEST 2021 == rcsfgenerate + rcsfinteract
 == Tue Jun  8 15:50:12 CEST 2021 == rangular
 == Tue Jun  8 15:50:12 CEST 2021 == rwfnestimate
 == Tue Jun  8 15:50:12 CEST 2021 == rmcdhf (Iteration number  11)
 == Tue Jun  8 15:50:27 CEST 2021 == rci
 == Tue Jun  8 15:51:10 CEST 2021 == jj2lsj
 == Tue Jun  8 15:51:10 CEST 2021 == rtransition
 == Tue Jun  8 15:51:18 CEST 2021 == done
 
Total Execution time - 0 hours 1 min 8 sec

Jun 08 '21 12:06 jongrumer

Ok, this is actually cool. With a proper Intel ifort+MKL+MPI installation, just doing

FC=ifort BLA_VENDOR=Intel10_64lp_seq ./configure.sh

seems to automatically configure a CMake build that uses MKL (via FindBLAS) and also links against the Intel MPI.

I am not quite sure that adding -mkl is the right way to go. If you don't specify BLA_VENDOR=Intel10_64lp_seq, FindBLAS will still try to link against the system OpenBLAS (if available). Maybe the more correct thing would be to set BLA_VENDOR if we detect the Intel compiler?

Jun 09 '21 00:06 mortenpi

Ok great! I'll try setting BLA_VENDOR then, but seems unreasonably complicated...but we still need to set -mkl to make sure MKL is used also for all the other things, or what are your thoughts there? Just remembered that there might be an mklvars.sh that should be sourced...at least there used to be something like that.

EDIT: Seems like -mkl should be enough, at least if you properly sourced the source /opt/intel/oneapi/setvars.sh - https://software.intel.com/content/www/us/en/develop/articles/using-mkl-in-intel-compiler-mkl-qmkl-options.html

Jun 09 '21 07:06 jongrumer