CONQUEST-release icon indicating copy to clipboard operation
CONQUEST-release copied to clipboard

f-exx-opt: Results are dependent on compiler/library versions and # of MPI ranks

Open connoraird opened this issue 1 year ago • 1 comments
trafficstars

Description

In the branch f-exx-opt the Conquest_out results appear to be dependent on the compiler/libraries and the number of MPI ranks you use.

Compiler and library versions and their results

The results mentioned are for running the benchmark test_EXX_isol_C2H4_4proc_PBE0ERI_fullSZP_0.4_SCF

Mac, Apple ARM

On an M2 mac, the following Homebrew versions of each dependency were used.

fftw v3.3.10 scalapack v2.2.0_1 openblas v0.3.27 libxc v6.2.2 lapack v3.12.0 openmpi v5.0.3 gcc (gfortran) v14.1.0

With the above, the Harris-Foulkes energy obtained is, np-2, nt-1 - |* Harris-Foulkes energy = -13.443098419459020 Ha np-4, nt-1 - |* Harris-Foulkes energy = -14.028830939450383 Ha

Linux

On a UCL cluster (myriad) the following version were used.

fftw v3.3.8 scalapack v2.1.0 openblas v0.3.7 libxc v6.2.2 compiled myself with gcc v9.2.0 lapack from the above openblas openmpi v3.1.5 gcc (gfortran) v9.2.0

With the above, the Harris-Foulkes energy obtained was, np-2, nt-1 - |* Harris-Foulkes energy = -13.300253717139059Ha np-4, nt-1 - |* Harris-Foulkes energy = -13.561798008220549 Ha

connoraird avatar May 14 '24 13:05 connoraird

Profiling results for test_EXX_isol_C2H4_4proc_PBE0ERI_fullSZP_0.4_SCF on myriad using the intel compiler and the following modules.

Currently Loaded Modulefiles:
 1) beta-modules               5) libxc/6.2.2/intel-2022   9) git/2.41.0-lfs-3.3.0
 2) gcc-libs/10.2.0            6) cmake/3.21.1            10) emacs/28.1
 3) compilers/intel/2022.2     7) python/3.8.6            11) userscripts/1.4.0
 4) mpi/intel/2021.6.0/intel   8) gerun

The results of these runs are:

np-2:
      |* Harris-Foulkes energy   =       -13.038957181266369 Ha
np-4:
      |* Harris-Foulkes energy   =       -12.967703483818312 Ha

When running with 4 ranks and 2 threads (np-4), about 25% of the time for rank 0's primary thread is spent at a call to MPI_Wait.

When running with 2 ranks and 4 threads (np-2), no time is spent in this call for the primary thread of rank 0.

connoraird avatar May 14 '24 15:05 connoraird

I think that this is now fixed by the commits above (May 30). @connoraird Can you confirm? If so, we should close the issue.

davidbowler avatar Sep 11 '24 13:09 davidbowler

@davidbowler The commits above solved the issue so far as ensuring that exx_phi_on_grid was called with an xys input that was always initialised to zero, which seems to works for the current test cases. However, I'm unsure if this will always be the desired behaviour. I believe there may be some future work to make sure we are passing the correct values to exx_phi_on_grid but for this bug, I'd say it has been completed.

In case it helps in the future, I'm adding a description I wrote at the time of how I solved the bug:

"I noticed that the issue was present for m_kern_exx_eri but not the GTO version. A key difference between the two is that the GTO version does not call exx_phi_on_grid. Therefore, I assumed the issue could be in there. I then noticed that the first call to exx_phi_on_grid in m_kern_exx_eri is different to the one in m_kern_exx_cri. In the CRI version xyz_zero is passed as the xyz param but kg%xyz was passed for the ERI version. When I changed this so both pass xyz_zero, the code now works as expected."

connoraird avatar Sep 16 '24 08:09 connoraird