CCPP MPI interface
Description
Many schemes have I/O in their initialization phases, but are not guarded by MPI commands. Adding these MPI commands, and their associated error checking, within the schemes introduces redundancies.
Explanation: This means that these schemes read input files with each MPI task individually at the same time. This can cause problems on parallel file systems with large core counts, as recently experienced on the DOD HPCMP system Narwhal. Reading this data with the MPI root rank only and then broadcasting it resolves the problem. However, coding up these MPI broadcast calls directly, capturing the error and reporting it in a CCPP-compliant way, is tedious and results in several lines of code for each MPI call. This can be hidden in a CCPP MPI interface that takes care of these CCPP-specific aspects.
Solution
Create a CCPP MPI interface
@DomHeinzeller @nusbaume @peverwhee @cacraigucar
Would it be possible to add a few examples and/or pseudo code with this issue? Just to make sure we're on the same page as to how this would look to consumers and from a maintenance perspective.
With the caveat that this is a simplistic version that does not address all the CCPP requirements (do not stop model etc; ccpp_external_abort is a non-CCPP compliant NEPTUNE thing) and that does not take advantage of abstracting the MPI calls in a module yet:
module mpiutil
use iso_fortran_env, only : real32, real64
use mpi
implicit none
private
public ccpp_bcast
interface ccpp_bcast
procedure :: bcast_i32d0
procedure :: bcast_i32d1
procedure :: bcast_i32d2
procedure :: bcast_i32d3
procedure :: bcast_r32d0
procedure :: bcast_r64d0
procedure :: bcast_r32d1
procedure :: bcast_r64d1
procedure :: bcast_r32d2
procedure :: bcast_r64d2
procedure :: bcast_r32d3
procedure :: bcast_r64d3
procedure :: bcast_r32d4
procedure :: bcast_r64d4
procedure :: bcast_r32d5
procedure :: bcast_r64d5
procedure :: bcast_ld0
end interface ccpp_bcast
contains
! Helper routines for MPI broadcasting
subroutine bcast_i32d0(arr, root, comm, ierr)
integer, intent(inout) :: arr
integer, intent(in) :: root, comm
integer, intent(out) :: ierr
call MPI_BCAST(arr, 1, MPI_INTEGER, root, comm, ierr)
if (ierr/=MPI_SUCCESS) then
call ccpp_external_abort("mpiutil.F90:bcast_i32d0")
end if
end subroutine bcast_i32d0
subroutine bcast_i32d1(arr, root, comm, ierr)
integer, intent(inout) :: arr(:)
integer, intent(in) :: root, comm
integer, intent(out) :: ierr
call MPI_BCAST(arr, size(arr), MPI_INTEGER, root, comm, ierr)
if (ierr/=MPI_SUCCESS) then
call ccpp_external_abort("mpiutil.F90:bcast_i32d1")
end if
end subroutine bcast_i32d1
subroutine bcast_i32d2(arr, root, comm, ierr)
integer, intent(inout) :: arr(:,:)
integer, intent(in) :: root, comm
integer, intent(out) :: ierr
call MPI_BCAST(arr, size(arr), MPI_INTEGER, root, comm, ierr)
if (ierr/=MPI_SUCCESS) then
call ccpp_external_abort("mpiutil.F90:bcast_i32d2")
end if
end subroutine bcast_i32d2
subroutine bcast_i32d3(arr, root, comm, ierr)
integer, intent(inout) :: arr(:,:,:)
integer, intent(in) :: root, comm
integer, intent(out) :: ierr
call MPI_BCAST(arr, size(arr), MPI_INTEGER, root, comm, ierr)
if (ierr/=MPI_SUCCESS) then
call ccpp_external_abort("mpiutil.F90:bcast_i32d3")
end if
end subroutine bcast_i32d3
subroutine bcast_r32d0(arr, root, comm, ierr)
real(kind=real32), intent(inout) :: arr
integer, intent(in) :: root, comm
integer, intent(out) :: ierr
call MPI_BCAST(arr, 1, MPI_REAL, root, comm, ierr)
if (ierr/=MPI_SUCCESS) then
call ccpp_external_abort("mpiutil.F90:bcast_r32d0")
end if
end subroutine bcast_r32d0
subroutine bcast_r64d0(arr, root, comm, ierr)
real(kind=real64), intent(inout) :: arr
integer, intent(in) :: root, comm
integer, intent(out) :: ierr
call MPI_BCAST(arr, 1, MPI_DOUBLE_PRECISION, root, comm, ierr)
if (ierr/=MPI_SUCCESS) then
call ccpp_external_abort("mpiutil.F90:bcast_r64d0")
end if
end subroutine bcast_r64d0
subroutine bcast_r32d1(arr, root, comm, ierr)
real(kind=real32), intent(inout) :: arr(:)
integer, intent(in) :: root, comm
integer, intent(out) :: ierr
call MPI_BCAST(arr, size(arr), MPI_REAL, root, comm, ierr)
if (ierr/=MPI_SUCCESS) then
call ccpp_external_abort("mpiutil.F90:bcast_r32d1")
end if
end subroutine bcast_r32d1
subroutine bcast_r64d1(arr, root, comm, ierr)
real(kind=real64), intent(inout) :: arr(:)
integer, intent(in) :: root, comm
integer, intent(out) :: ierr
call MPI_BCAST(arr, size(arr), MPI_DOUBLE_PRECISION, root, comm, ierr)
if (ierr/=MPI_SUCCESS) then
call ccpp_external_abort("mpiutil.F90:bcast_r64d1")
end if
end subroutine bcast_r64d1
subroutine bcast_r32d2(arr, root, comm, ierr)
real(kind=real32), intent(inout) :: arr(:,:)
integer, intent(in) :: root, comm
integer, intent(out) :: ierr
call MPI_BCAST(arr, size(arr), MPI_REAL, root, comm, ierr)
if (ierr/=MPI_SUCCESS) then
call ccpp_external_abort("mpiutil.F90:bcast_r32d2")
end if
end subroutine bcast_r32d2
subroutine bcast_r64d2(arr, root, comm, ierr)
real(kind=real64), intent(inout) :: arr(:,:)
integer, intent(in) :: root, comm
integer, intent(out) :: ierr
call MPI_BCAST(arr, size(arr), MPI_DOUBLE_PRECISION, root, comm, ierr)
if (ierr/=MPI_SUCCESS) then
call ccpp_external_abort("mpiutil.F90:bcast_r64d2")
end if
end subroutine bcast_r64d2
subroutine bcast_r32d3(arr, root, comm, ierr)
real(kind=real32), intent(inout) :: arr(:,:,:)
integer, intent(in) :: root, comm
integer, intent(out) :: ierr
call MPI_BCAST(arr, size(arr), MPI_REAL, root, comm, ierr)
if (ierr/=MPI_SUCCESS) then
call ccpp_external_abort("mpiutil.F90:bcast_r32d3")
end if
end subroutine bcast_r32d3
subroutine bcast_r64d3(arr, root, comm, ierr)
real(kind=real64), intent(inout) :: arr(:,:,:)
integer, intent(in) :: root, comm
integer, intent(out) :: ierr
call MPI_BCAST(arr, size(arr), MPI_DOUBLE_PRECISION, root, comm, ierr)
if (ierr/=MPI_SUCCESS) then
call ccpp_external_abort("mpiutil.F90:bcast_r64d3")
end if
end subroutine bcast_r64d3
subroutine bcast_r32d4(arr, root, comm, ierr)
real(kind=real32), intent(inout) :: arr(:,:,:,:)
integer, intent(in) :: root, comm
integer, intent(out) :: ierr
call MPI_BCAST(arr, size(arr), MPI_REAL, root, comm, ierr)
if (ierr/=MPI_SUCCESS) then
call ccpp_external_abort("mpiutil.F90:bcast_r32d4")
end if
end subroutine bcast_r32d4
subroutine bcast_r64d4(arr, root, comm, ierr)
real(kind=real64), intent(inout) :: arr(:,:,:,:)
integer, intent(in) :: root, comm
integer, intent(out) :: ierr
call MPI_BCAST(arr, size(arr), MPI_DOUBLE_PRECISION, root, comm, ierr)
if (ierr/=MPI_SUCCESS) then
call ccpp_external_abort("mpiutil.F90:bcast_r64d4")
end if
end subroutine bcast_r64d4
subroutine bcast_r32d5(arr, root, comm, ierr)
real(kind=real32), intent(inout) :: arr(:,:,:,:,:)
integer, intent(in) :: root, comm
integer, intent(out) :: ierr
call MPI_BCAST(arr, size(arr), MPI_REAL, root, comm, ierr)
if (ierr/=MPI_SUCCESS) then
call ccpp_external_abort("mpiutil.F90:bcast_r32d5")
end if
end subroutine bcast_r32d5
subroutine bcast_r64d5(arr, root, comm, ierr)
real(kind=real64), intent(inout) :: arr(:,:,:,:,:)
integer, intent(in) :: root, comm
integer, intent(out) :: ierr
call MPI_BCAST(arr, size(arr), MPI_DOUBLE_PRECISION, root, comm, ierr)
if (ierr/=MPI_SUCCESS) then
call ccpp_external_abort("mpiutil.F90:bcast_r64d5")
end if
end subroutine bcast_r64d5
subroutine bcast_ld0(arr, root, comm, ierr)
logical, intent(inout) :: arr
integer, intent(in) :: root, comm
integer, intent(out) :: ierr
call MPI_BCAST(arr, 1, MPI_LOGICAL, root, comm, ierr)
if (ierr/=MPI_SUCCESS) then
call ccpp_external_abort("mpiutil.F90:bcast_ld0")
end if
end subroutine bcast_ld0
end module mpiutil
Thanks @climbfuji! Not to add more to your plate but could we add any details about what this would look like to users with the #ifdef ... construct we mentioned? Or if there are other mechanisms that we could use to interface with this?
I don't have an #ifdef examples at hand. Here is example code for how to use the above ccpp_bcast (again, simplistic, since I'll need to redo a lot of this when updating to the latest ccpp-physics code and the solution we as a group come up with for the MPI call wrappers.
subroutine read_h2odata (h2o_phys, mpicomm, mpirank, mpiroot)
use machine, only: kind_phys
use mpiutil, only: ccpp_bcast
use h2o_def
...
read_and_broadcast_1: if (mpirank==mpiroot) then
open(unit=kh2opltc,file='global_h2oprdlos.f77', form='unformatted', convert='big_endian')
!--- read in indices
!---
read (kh2opltc) h2o_coeff, latsh2o, levh2o, timeh2o
write(*,*) 'Reading in h2odata from global_h2oprdlos.f77 '
write(*,*) ' h2o_coeff = ', h2o_coeff
write(*,*) ' latsh2o = ', latsh2o
write(*,*) ' levh2o = ', levh2o
write(*,*) ' timeh2o = ', timeh2o
endif read_and_broadcast_1
call ccpp_bcast(h2o_coeff, mpiroot, mpicomm, ierr)
call ccpp_bcast(latsh2o, mpiroot, mpicomm, ierr)
call ccpp_bcast(levh2o, mpiroot, mpicomm, ierr)
call ccpp_bcast(timeh2o, mpiroot, mpicomm, ierr)
...
One possible way to provide a generic plus additional, host-specific implementations of ccpp_bcast would be to have different files mpiutil.F90, mpiutil.fv3.F90, mpiutil.neptune.F90, ... - see existing examples for GFS_phys_time_vary etc. This way, any model that is happy with the generic approach (something simple as above?) can use this, and models that need a more elaborate solution can implement their own.
This does not address the issue of making code in the CCPP repo dependent on CCPP, however. For this, there are multiple solutions: ifdefs (#ifdef CCPP out the opening and closing read_and_broadcast_1 lines and the ccpp_bcast lines in the above example), stubs, ...
@mwaxmonsky Here is an example with a ton of MPI commands embedded within an initialization routine, rrtmgp_lw_gas_optics.F90 (I never removed the PP directives when mpi_f08 became a requirement, but they could be now). This module
- reads data array sizes from file on master processor
- allocates space for data on all processors (This is needed to achieve "statlessness")
- read from file and broadcast data on master processor
When we were developing the code, we had the error checking piece, but for brevity we removed it after everything was working. With a CCPP MPI interface like Dom mentioned, this routine would look pretty much the same, but
call mpi_bcast()
would be replaced by
call ccpp_bcast()
Note to self. The update for the NRL physics, required for the transition of NEPTUNE to operations, was merged in https://github.nrlmry.navy.mil/NEPTUNE/ccpp-physics/pull/3.