rcps-buildscripts
rcps-buildscripts copied to clipboard
Install Request: GPU build of GROMACS 2021.5
EPSRC work, plus IN:05095607
Do a GPU build of the current version (GROMACS 2021.5)
https://manual.gromacs.org/current/install-guide/index.html
Initial test build going.
Built in my space, now need to test on A100s.
Got the regression tests from https://ftp.gromacs.org/regressiontests/regressiontests-2021.5.tar.gz
https://manual.gromacs.org/current/install-guide/index.html#testing-gromacs-for-correctness
Can get the regression tests as part of the build, but we probably want to have them be separate steps so we can run the tests as a job afterwards and not the whole build.
Once you have downloaded them, unpack the tarball, source GMXRC as described above, and run ./gmxtest.pl all inside the regression tests folder. You can find more options (e.g. adding double when using double precision, or -only expanded to run just the tests whose names match “expanded”) if you just execute the script without options.
Regression tests: helps if we tell it the right suffix... ./gmxtest.pl -suffix _cuda all
Had quite a few failures:
29 out of 83 complex tests FAILED
6 out of 14 freeenergy tests FAILED
All 12 rotation tests PASSED
Essential dynamics tests FAILED with 5 errors!
Example:
home/cceahke/gromacs/regressiontests-2021.5/essentialdynamics/radcon/mdrun.out
Fatal error:
When using GPUs, setting the number of OpenMP threads without specifying the
number of ranks can lead to conflicting demands. Please specify the number of
thread-MPI ranks as well (option -ntmpi).
That was 1 GPU, 1 core - quite a few of the tests seem to want 8, so will try that and set -ntmpi
.
./gmxtest.pl -suffix _cuda -mdrun -ntmpi $NSLOTS all
Apparently that is the wrong way and it uses nt for that
./gmxtest.pl -nt $NSLOTS -suffix _cuda all
All 89 complex tests PASSED
All 20 freeenergy tests PASSED
All 12 rotation tests PASSED
All 0 extra tests PASSED
All 7 essential dynamics tests PASSED
Woo!
That isn't exactly an efficient way of running it since it massively oversubscribed the CPUs (and thought it had all 36 in the first place), but anyway...
Using 8 MPI threads
Non-default thread affinity set, disabling internal thread affinity
Using 8 OpenMP threads per tMPI thread
WARNING: Oversubscribing the available 36 logical CPU cores with 64 threads.
This will cause considerable performance loss.
2 GPUs, -pe smp 16
, export OMP_NUM_THREADS=2
, ./gmxtest.pl -nt $NSLOTS -suffix _cuda all
(it reran some tests with 8 ranks because they don't work with 16):
Successfully detected 2 gpus.
All 89 complex tests PASSED
All 20 freeenergy tests PASSED
All 12 rotation tests PASSED
All 0 extra tests PASSED
All 7 essential dynamics tests PASSED
Historically we build the mdrun-only version for the MPI builds. For use this is suggested because the mdrun part is the only component that uses MPI. But when testing, the test suite wants to test gmx_mpi_cuda and mdrun both together, and for mdrun-only ones says to build the full version first: https://manual.gromacs.org/current/install-guide/index.html#testing-for-mdrun-only-executables
So I'll build the full version for me anyway, even if it doesn't get used later.
Ok, 1 GPU + 8 cores in an MPI environment has passed all tests. Have submitted a full 4 GPUs and given it 32 cores.
Note: we've apparently been putting our suffixes an unexpected way round for cuda + mpi builds, so I changed them back to the expected way: gmx_mpi_cuda
rather than gmx_cuda_mpi
.
4 GPUs, -pe mpi 32
It had to do more rejigging of options during the tests (adding -ntomp 1
).
1 out of 88 complex tests FAILED
All 20 freeenergy tests PASSED
All 12 rotation tests PASSED
All 0 extra tests PASSED
All 7 essential dynamics tests PASSED
Testing nbnxn_vsite . . . gmx_mpi_cuda grompp -f /lustre/home/cceahke/gromacs/regressiontests-2021.5/complex/nbnxn_vsite/grompp.mdp -c /lustre/home/cceahke
/gromacs/regressiontests-2021.5/complex/nbnxn_vsite/conf -r /lustre/home/cceahke/gromacs/regressiontests-2021.5/complex/nbnxn_vsite/conf -p /lustre/home/cc
eahke/gromacs/regressiontests-2021.5/complex/nbnxn_vsite/topol -ref /lustre/home/cceahke/gromacs/regressiontests-2021.5/complex/nbnxn_vsite/rotref -maxwarn
10 >grompp.out 2>grompp.err
gmx_mpi_cuda check -s1 /lustre/home/cceahke/gromacs/regressiontests-2021.5/complex/nbnxn_vsite/reference_s.tpr -s2 topol.tpr -tol 0.0001 -abstol 0.001 >che
cktpr.out 2>checktpr.err
mpirun -np 6 -wdir /lustre/home/cceahke/gromacs/regressiontests-2021.5/complex/nbnxn_vsite gmx_mpi_cuda mdrun -notunepme >mdrun.out 2>&1
Abnormal return value for 'mpirun -np 6 -wdir /lustre/home/cceahke/gromacs/regressiontests-2021.5/complex/nbnxn_vsite gmx_mpi_cuda mdrun -notunepme
>mdrun.out 2>&1' was -1
FAILED. Check mdrun.out, md.log file(s) in nbnxn_vsite for nbnxn_vsite
Program: gmx mdrun, version 2021.5
Source file: src/gromacs/taskassignment/taskassignment.cpp (line 306)
Function: static gmx::GpuTaskAssignments gmx::GpuTaskAssignmentsBuilder::build(const std::vector<int>&, const std::vector<int>&, const gmx_hw_info_t&, MPI_Comm, const gmx::PhysicalNodeCommunicator&, gmx::TaskTarget, gmx::TaskTarget, gmx::TaskTarget, gmx::TaskTarget, bool, bool, bool, bool)
MPI rank: 0 (out of 6)
Inconsistency in user input:
There were 6 GPU tasks found on node node-l00a-002.myriad.ucl.ac.uk, but 4
GPUs were available. If the GPUs are equivalent, then it is usually best to
have a number of tasks that is a multiple of the number of GPUs. You should
reconsider your GPU task assignment, number of ranks, or your use of the -nb,
-pme, and -npme options, perhaps after measuring the performance you can get.
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
I think that simulation is just the wrong size - it did a domain decomposition for 6 ranks (3x2x1), and didn't readjust based on resources.
README for that test says:
This system tests the nonbonded NxN kernel with Lorentz-Berthelot combination rules (only relevant for x86 SIMD kernels), virtual sites and pressure coupling. This means the vsite OpenMP code is also tested.
Forces and velocities are not compared because they are not reprodicible within tolerance. At one time, this test did not work with MPI parallelization. That might have been because the box size was very tight wrt cutoff. This has now been relaxed. Further, verlet-buffer-list is now -1 to hard-code rlist, so that the automatic increase of rlist with GPUs does not make the DD impossible.
- [x] Myriad install
- [x] modulefile
To use:
module load beta-modules
module unload -f compilers mpi gcc-libs
module load gcc-libs/10.2.0
module load compilers/gnu/10.2.0
module load python3
module load cuda/11.3.1/gnu-10.2.0
# these if on Myriad
module load numactl/2.0.12
module load binutils/2.36.1/gnu-10.2.0
module load ucx/1.9.0/gnu-10.2.0
module load mpi/openmpi/4.0.5/gnu-10.2.0
module load gromacs/2021.5/cuda-11.3
Executables are gmx_cuda
, gmx_mpi_cuda
, mdrun_mpi_cuda
.
- [x] Young install
cd regressiontests-2021.5
source /shared/ucl/apps/gromacs/2021.5-gpu/gnu-10.2.0/bin/GMXRC
./gmxtest.pl -nt $NSLOTS -suffix _cuda all
All 48 complex tests PASSED
All 10 freeenergy tests PASSED
All 12 rotation tests PASSED
All 0 extra tests PASSED
All 7 essential dynamics tests PASSED