rcps-buildscripts icon indicating copy to clipboard operation
rcps-buildscripts copied to clipboard

Install Request: GPU build of GROMACS 2021.5

Open heatherkellyucl opened this issue 3 years ago • 16 comments

EPSRC work, plus IN:05095607

Do a GPU build of the current version (GROMACS 2021.5)

https://manual.gromacs.org/current/install-guide/index.html

heatherkellyucl avatar Jan 17 '22 10:01 heatherkellyucl

Initial test build going.

heatherkellyucl avatar Jan 18 '22 11:01 heatherkellyucl

Built in my space, now need to test on A100s.

heatherkellyucl avatar Jan 18 '22 15:01 heatherkellyucl

Got the regression tests from https://ftp.gromacs.org/regressiontests/regressiontests-2021.5.tar.gz

https://manual.gromacs.org/current/install-guide/index.html#testing-gromacs-for-correctness

Can get the regression tests as part of the build, but we probably want to have them be separate steps so we can run the tests as a job afterwards and not the whole build.

Once you have downloaded them, unpack the tarball, source GMXRC as described above, and run ./gmxtest.pl all inside the regression tests folder. You can find more options (e.g. adding double when using double precision, or -only expanded to run just the tests whose names match “expanded”) if you just execute the script without options.

heatherkellyucl avatar Jan 19 '22 10:01 heatherkellyucl

Regression tests: helps if we tell it the right suffix... ./gmxtest.pl -suffix _cuda all

heatherkellyucl avatar Jan 20 '22 11:01 heatherkellyucl

Had quite a few failures:

29 out of 83 complex tests FAILED
6 out of 14 freeenergy tests FAILED
All 12 rotation tests PASSED
Essential dynamics tests FAILED with 5 errors!

Example: home/cceahke/gromacs/regressiontests-2021.5/essentialdynamics/radcon/mdrun.out

Fatal error:
When using GPUs, setting the number of OpenMP threads without specifying the
number of ranks can lead to conflicting demands. Please specify the number of
thread-MPI ranks as well (option -ntmpi).

That was 1 GPU, 1 core - quite a few of the tests seem to want 8, so will try that and set -ntmpi.

./gmxtest.pl -suffix _cuda -mdrun -ntmpi $NSLOTS all

heatherkellyucl avatar Jan 20 '22 14:01 heatherkellyucl

Apparently that is the wrong way and it uses nt for that

./gmxtest.pl -nt $NSLOTS -suffix _cuda all

heatherkellyucl avatar Jan 20 '22 16:01 heatherkellyucl

All 89 complex tests PASSED
All 20 freeenergy tests PASSED
All 12 rotation tests PASSED
All 0 extra tests PASSED
All 7 essential dynamics tests PASSED

Woo!

heatherkellyucl avatar Jan 21 '22 10:01 heatherkellyucl

That isn't exactly an efficient way of running it since it massively oversubscribed the CPUs (and thought it had all 36 in the first place), but anyway...

Using 8 MPI threads

Non-default thread affinity set, disabling internal thread affinity

Using 8 OpenMP threads per tMPI thread


WARNING: Oversubscribing the available 36 logical CPU cores with 64 threads.
         This will cause considerable performance loss.

heatherkellyucl avatar Jan 21 '22 10:01 heatherkellyucl

2 GPUs, -pe smp 16, export OMP_NUM_THREADS=2, ./gmxtest.pl -nt $NSLOTS -suffix _cuda all (it reran some tests with 8 ranks because they don't work with 16):

Successfully detected 2 gpus.

All 89 complex tests PASSED
All 20 freeenergy tests PASSED
All 12 rotation tests PASSED
All 0 extra tests PASSED
All 7 essential dynamics tests PASSED

heatherkellyucl avatar Jan 24 '22 10:01 heatherkellyucl

Historically we build the mdrun-only version for the MPI builds. For use this is suggested because the mdrun part is the only component that uses MPI. But when testing, the test suite wants to test gmx_mpi_cuda and mdrun both together, and for mdrun-only ones says to build the full version first: https://manual.gromacs.org/current/install-guide/index.html#testing-for-mdrun-only-executables

So I'll build the full version for me anyway, even if it doesn't get used later.

heatherkellyucl avatar Jan 24 '22 15:01 heatherkellyucl

Ok, 1 GPU + 8 cores in an MPI environment has passed all tests. Have submitted a full 4 GPUs and given it 32 cores.

Note: we've apparently been putting our suffixes an unexpected way round for cuda + mpi builds, so I changed them back to the expected way: gmx_mpi_cuda rather than gmx_cuda_mpi.

heatherkellyucl avatar Jan 25 '22 11:01 heatherkellyucl

4 GPUs, -pe mpi 32 It had to do more rejigging of options during the tests (adding -ntomp 1).

1 out of 88 complex tests FAILED
All 20 freeenergy tests PASSED
All 12 rotation tests PASSED
All 0 extra tests PASSED
All 7 essential dynamics tests PASSED
Testing nbnxn_vsite . . . gmx_mpi_cuda grompp -f /lustre/home/cceahke/gromacs/regressiontests-2021.5/complex/nbnxn_vsite/grompp.mdp -c /lustre/home/cceahke
/gromacs/regressiontests-2021.5/complex/nbnxn_vsite/conf -r /lustre/home/cceahke/gromacs/regressiontests-2021.5/complex/nbnxn_vsite/conf -p /lustre/home/cc
eahke/gromacs/regressiontests-2021.5/complex/nbnxn_vsite/topol -ref /lustre/home/cceahke/gromacs/regressiontests-2021.5/complex/nbnxn_vsite/rotref -maxwarn
 10  >grompp.out 2>grompp.err
gmx_mpi_cuda check -s1 /lustre/home/cceahke/gromacs/regressiontests-2021.5/complex/nbnxn_vsite/reference_s.tpr -s2 topol.tpr -tol 0.0001 -abstol 0.001 >che
cktpr.out 2>checktpr.err
mpirun -np 6 -wdir /lustre/home/cceahke/gromacs/regressiontests-2021.5/complex/nbnxn_vsite gmx_mpi_cuda mdrun        -notunepme >mdrun.out 2>&1

Abnormal return value for 'mpirun -np 6 -wdir /lustre/home/cceahke/gromacs/regressiontests-2021.5/complex/nbnxn_vsite gmx_mpi_cuda mdrun        -notunepme 
>mdrun.out 2>&1' was -1
FAILED. Check mdrun.out, md.log file(s) in nbnxn_vsite for nbnxn_vsite
Program:     gmx mdrun, version 2021.5
Source file: src/gromacs/taskassignment/taskassignment.cpp (line 306)
Function:    static gmx::GpuTaskAssignments gmx::GpuTaskAssignmentsBuilder::build(const std::vector<int>&, const std::vector<int>&, const gmx_hw_info_t&, MPI_Comm, const gmx::PhysicalNodeCommunicator&, gmx::TaskTarget, gmx::TaskTarget, gmx::TaskTarget, gmx::TaskTarget, bool, bool, bool, bool)
MPI rank:    0 (out of 6)

Inconsistency in user input:
There were 6 GPU tasks found on node node-l00a-002.myriad.ucl.ac.uk, but 4
GPUs were available. If the GPUs are equivalent, then it is usually best to
have a number of tasks that is a multiple of the number of GPUs. You should
reconsider your GPU task assignment, number of ranks, or your use of the -nb,
-pme, and -npme options, perhaps after measuring the performance you can get.

For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.

I think that simulation is just the wrong size - it did a domain decomposition for 6 ranks (3x2x1), and didn't readjust based on resources.

README for that test says:

This system tests the nonbonded NxN kernel with Lorentz-Berthelot combination rules (only relevant for x86 SIMD kernels), virtual sites and pressure coupling. This means the vsite OpenMP code is also tested.

Forces and velocities are not compared because they are not reprodicible within tolerance. At one time, this test did not work with MPI parallelization. That might have been because the box size was very tight wrt cutoff. This has now been relaxed. Further, verlet-buffer-list is now -1 to hard-code rlist, so that the automatic increase of rlist with GPUs does not make the DD impossible.

heatherkellyucl avatar Jan 25 '22 13:01 heatherkellyucl

  • [x] Myriad install
  • [x] modulefile

heatherkellyucl avatar Jan 26 '22 09:01 heatherkellyucl

To use:

module load beta-modules
module unload -f compilers mpi gcc-libs
module load gcc-libs/10.2.0 
module load compilers/gnu/10.2.0 
module load python3 
module load cuda/11.3.1/gnu-10.2.0 

# these if on Myriad
module load numactl/2.0.12 
module load binutils/2.36.1/gnu-10.2.0 
module load ucx/1.9.0/gnu-10.2.0 

module load mpi/openmpi/4.0.5/gnu-10.2.0 
module load gromacs/2021.5/cuda-11.3

Executables are gmx_cuda, gmx_mpi_cuda, mdrun_mpi_cuda.

heatherkellyucl avatar Jan 27 '22 12:01 heatherkellyucl

  • [x] Young install

heatherkellyucl avatar Jun 30 '22 11:06 heatherkellyucl

cd regressiontests-2021.5
source /shared/ucl/apps/gromacs/2021.5-gpu/gnu-10.2.0/bin/GMXRC
 ./gmxtest.pl -nt $NSLOTS -suffix _cuda all

All 48 complex tests PASSED
All 10 freeenergy tests PASSED
All 12 rotation tests PASSED
All 0 extra tests PASSED
All 7 essential dynamics tests PASSED

heatherkellyucl avatar Jun 30 '22 14:06 heatherkellyucl