rcps-buildscripts icon indicating copy to clipboard operation
rcps-buildscripts copied to clipboard

Install Request: GPU build of LAMMPS [IN05118528] [IN05126499]

Open heatherkellyucl opened this issue 3 years ago • 87 comments

EPSRC work.

Current stable version is 29 Sep 2021 update 2.

https://github.com/lammps/lammps/releases https://docs.lammps.org/stable/

heatherkellyucl avatar Jan 17 '22 10:01 heatherkellyucl

According to the install documentation LAMMPS now suggests using cmake to build instead of system tailored Makefiles. Previous builds we have done used the tailored Makefile method.

I'm going to start using the cmake method and try a test build. This will need a new build script.

balston avatar Jan 18 '22 14:01 balston

So we are building LAMMPS 29th September 2021 Update 2. Source downloadable from:

https://github.com/lammps/lammps/archive/refs/tags/stable_29Sep2021_update2.tar.gz

We need two new build scripts:

lammps-29Sep21_2-basic_install  
lammps-29Sep21_2-gpu_install

I'm making first versions of them now.

balston avatar Jan 18 '22 15:01 balston

First attempt at build script ready. Running as ccspapp:

cd /shared/ucl/apps/build_scripts
./lammps-29Sep21_2-basic_install 2>&1 | tee ~/Software/LAMMPS/lammps-29Sep21_2-basic_install.log-19012022-1

balston avatar Jan 19 '22 19:01 balston

First attempt failed:

CMake Error at /lustre/shared/ucl/apps/cmake/3.21.1/gnu-4.9.2/share/cmake-3.21/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find Python (missing: Python_INCLUDE_DIRS Python_LIBRARIES
  Development Development.Module Development.Embed)
Call Stack (most recent call first):
  /lustre/shared/ucl/apps/cmake/3.21.1/gnu-4.9.2/share/cmake-3.21/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
  /lustre/shared/ucl/apps/cmake/3.21.1/gnu-4.9.2/share/cmake-3.21/Modules/FindPython.cmake:556 (find_package_handle_standard_args)
  Modules/Packages/PYTHON.cmake:6 (find_package)
  CMakeLists.txt:445 (include)


-- Configuring incomplete, errors occurred!

Need to load a Python bundle.

balston avatar Jan 20 '22 15:01 balston

This time it has passed the configuration stage and is compiling stuff.

balston avatar Jan 20 '22 15:01 balston

Build finished with no obvious errors but I will need to check the build log carefully.

balston avatar Jan 20 '22 16:01 balston

Added in building shared libraries as this isn't the default and is needed for Plugin loading.

balston avatar Jan 21 '22 15:01 balston

Build has finished and the shared library has been built.

balston avatar Jan 21 '22 17:01 balston

I've now also added an option to build LAMMPS unit tests. Run it like this:

BUILD_UNIT_TESTS=yes ./lammps-29Sep21_2-basic_install 2>&1 | tee ~/Software/LAMMPS/lammps-29Sep21_2-basic_install.log-24012022-1

balston avatar Jan 24 '22 14:01 balston

Updating the GPU build script with the updates from lammps-29Sep21_2-basic_install with the GPU stuff added.

balston avatar Jan 24 '22 14:01 balston

Running the unit test is failing. Running:

module -f unload compilers mpi
module load compilers/nvidia/hpc-sdk/22.1
module load python3/recommended
cd /home/ccspapp/Software/LAMMPS/tmp.tDAcUNTvaj/lammps-stable_29Sep2021_update2/build
ctest -V

gives:

1:  HWLOC_HIDE_ERRORS=1
1: Test timeout computed to be: 1500
1: /home/ccspapp/Software/LAMMPS/tmp.tDAcUNTvaj/lammps-stable_29Sep2021_update2/build/lmp: /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /home/ccspapp/Software/LAMMPS/tmp.tDAcUNTvaj/lammps-stable_29Sep2021_update2/build/liblammps.so.0)

balston avatar Jan 25 '22 10:01 balston

So need to use a more up to date gcc-libs module. I'm also going to build versions using GNU compilers (10.2.0) and OpenMPI plus CUDA 11 for the GPU build as well as the Nvidia versions.

balston avatar Jan 26 '22 14:01 balston

Now build revised basic Nvidia version.

balston avatar Jan 26 '22 14:01 balston

GNU compilers and OpenMPI build script ready to test.

balston avatar Jan 26 '22 14:01 balston

Running:

BUILD_UNIT_TESTS=yes ./lammps-29Sep21_2-basic-gnu_install 2>&1 | tee ~/Software/LAMMPS/lammps-29Sep21_2-basic-gnu_install.log-26012022-1

to build GNU version with unit tests.

balston avatar Jan 26 '22 16:01 balston

The Nvivia build is still not working correctly so I have switched to the GNU build for the moment.

balston avatar Jan 26 '22 16:01 balston

GNU build for basic version has completed. I've quickly started the first couple of unit tests and they pass. So will submit a job to run the full set tomorrow.

Will now set up the GPU version build script.

balston avatar Jan 26 '22 17:01 balston

NOTE: the following modules are needed for the build and runtime for the basic version:

module -f unload compilers mpi gcc-libs
module load  beta-modules
module load  gcc-libs/10.2.0
module load  compilers/gnu/10.2.0
module load  numactl/2.0.12
module load  binutils/2.36.1/gnu-10.2.0
module load  ucx/1.9.0/gnu-10.2.0
module load  mpi/openmpi/4.0.5/gnu-10.2.0
module load  cmake/3.21.1
module load  python3/3.9-gnu-10.2.0

balston avatar Jan 26 '22 17:01 balston

Unit Test job for the GNU basic version submitted.

balston avatar Jan 27 '22 11:01 balston

Unit tests all passed:


100% tests passed, 0 tests failed out of 481

Total Test time (real) = 436.38 sec

balston avatar Jan 27 '22 12:01 balston

Now need a module file and can then try running some real examples.

balston avatar Jan 27 '22 12:01 balston

GNU version of GPU build script ready to run using these configuration options:

cmake -C ../cmake/presets/gcc.cmake -C ../cmake/presets/most.cmake -D GPU_API=cuda -D GPU_PREC=mixed -D GPU_ARCH=sm_80 -D BUILD_SHARED_LIBS=yes -D CMAKE_INSTALL_PREFIX=${INSTALL_PREFIX} ../cmake

balston avatar Jan 27 '22 14:01 balston

If you can give it multiple GPU architectures, worth doing sm_60, sm_70 and sm_80 so it works on all Myriad's GPUs.

heatherkellyucl avatar Jan 27 '22 14:01 heatherkellyucl

According to the documentation setting -D GPU_ARCH=sm_80 is a default and it should also include "support for all major GPU architectures supported by" the loaded CUDA module. sm_80 is the current latest GPU architecture supported by LAMMPS and is in CUDA 11.

balston avatar Jan 27 '22 14:01 balston

Running GPU build:

module -f unload gcc-libs
module load beta-modules
BUILD_UNIT_TESTS=yes ./lammps-29Sep21_2-gpu-gnu_install 2>&1 | tee ~/Software/LAMMPS/lammps-29Sep21_2-gpu-gnu_install.log-27012022-1

balston avatar Jan 27 '22 14:01 balston

GPU build has finished.

Checking to see if it has built correctly.

balston avatar Jan 27 '22 15:01 balston

More work needed on the build script - it has completely ignored building with CUDA!

balston avatar Jan 27 '22 15:01 balston

IN05118528 wants to use the basic (MPI) version on Young now.

balston avatar Jan 28 '22 10:01 balston

Basic GNU build version module ready to use for testing on Myriad. Needs the following module commands:

module -f unload compilers mpi gcc-libs
module load beta-modules
module load gcc-libs/10.2.0
module load compilers/gnu/10.2.0
module load python3/3.9-gnu-10.2.0

# The following three are only needed on Myriad.

module load numactl/2.0.12
module load binutils/2.36.1/gnu-10.2.0
module load ucx/1.9.0/gnu-10.2.0


module load mpi/openmpi/4.0.5/gnu-10.2.0
module load lammps/29sep21up2/basic/gnu-10.2.0

balston avatar Jan 28 '22 14:01 balston

Building the basic non GPU version on Kathleen to test multi-node stuff prior to building on Young.

Fixed (I think) the GPU build script to actually built the GPU version! Building it again.

balston avatar Jan 31 '22 11:01 balston