ufs-weather-model icon indicating copy to clipboard operation
ufs-weather-model copied to clipboard

Updating hpc-stack modules and miniconda locations for Hera, Gaea, Cheyenne, Orion, Jet

Open natalie-perlin opened this issue 2 years ago • 19 comments

Description

Update the locations of the hpc-stack modules and miniconda3 for compiling and running the UFS-weather-model on NOAA HPC systems, such as Hera, Gaea, Cheyenne, Orion, Jet. The modules are installed under role.epic account and placed in a common EPIC-managed space on each system. Gaea also uses the Lmod installed locally in the same common location (ufs-srweather-app/PR-352, ufs-weather-app/PR-353), and needs to run a script to initialize Lmod before loading a modulefile ufs_gaea.intel.lua. While ufs-weather model may use/require python to a lesser extent, the UFS-srweather-app relies heavily on conda environment.

For ease of maintenance of the libraries on the NOAA HPC systems, transition to new location of the modules built for both ufs-weather-model and ufs-srweather-app is needed.

Solution

Repo of the ufs-weather-model to be updated with the new version of miniconda and hpc libraries.

Udated installation locations have been used to load the modules listed in /ufs-weather-model/modulefiles/ufs_common and build the ufs model binaries. Hera gnu compilers includ UPD. 10/20/2022: Modules for Hera and Jet have been build for the already tested compiler intel/2022.1.2. Modules for the compiler/impi intel/2022.2.0 also remained and could be used when the upgrade is needed.

UPD. 10/24/2022: Modules for Hera gnu compilers (9.2.0, 10.2.0) and different mpich/openmpi combinations, and updated netcdf/4.9.0 have been prepared.

Cheyenne Lmod has been upgraded to v.8.7.13 systemwide after system maintenance on 10/21/2022.

Alternatives

Alternative solutions could be to have the hpc libraries and modules built in separate locations for the ufs-weather-model and ufs-srweather-app. The request from EPIC management, however, was to use a common location for the all the libraries.

Related to

A PR-419 in the ufs-srweather-model already exists, and a new PR will be made to the current repo.

Updated locations to load the conda/python and hpc-modules and how to load them on all the systems:

Hera python/miniconda : module use /scratch1/NCEPDEV/nems/role.epic/miniconda3/modulefiles module load miniconda3/4.12.0

Hera intel/2022.1.2 + impi/2022.1.2 : module load intel/2022.1.2 module load impi/2022.1.2 use /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/intel-2022.1.2/modulefiles/stack module load hpc/1.2.0 module load hpc-intel/2022.1.2 module load hpc-impi/2022.1.2

Hera intel/2022.1.2 + impi/2022.1.2 + netcdf-c 4.9.0: module load intel/2022.1.2 module load impi/2022.1.2 use /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/intel-2022.1.2_ncdf49/modulefiles/stack module load hpc/1.2.0 module load hpc-intel/2022.1.2 module load hpc-impi/2022.1.2

Hera gnu/9.2 + mpich/3.3.2 : module load gnu/9.2 module use /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/gnu-9.2/modulefiles/stack module load hpc/1.2.0 module load hpc-gnu/9.2 module load mpich/3.3.2 module load hpc-mpich/3.3.2

Hera gnu/10.2 + mpich/3.3.2 : module use /scratch1/NCEPDEV/nems/role.epic/gnu/modulefiles module load gnu/10.2.0 module use /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/gnu-10.2/modulefiles/stack module load hpc/1.2.0 module load hpc-gnu/10.2 module load mpich/3.3.2 module load hpc-mpich/3.3.2

Hera gnu/10.2 + openmpi/4.1.2 : module use /scratch1/NCEPDEV/nems/role.epic/gnu/modulefiles module load gnu/10.2.0 module use /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/gnu-10.2_openmpi/modulefiles/stack module load hpc/1.2.0 module load hpc-gnu/10.2 module load openmpi/4.1.2 module load hpc-openmpi/4.1.2

Hera gnu/9.2 + mpich/3.3.2 + netcdf-c 4.9.0: module load gnu/9.2 module use /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/gnu-9.2_ncdf49/modulefiles/stack module load hpc/1.2.0 module load hpc-gnu/9.2 module load mpich/3.3.2 module load hpc-mpich/3.3.2

Hera gnu/10.2 + mpich/3.3.2 + netcdf-c/4.9.0: module use /scratch1/NCEPDEV/nems/role.epic/gnu/modulefiles module load gnu/10.2.0 module use /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/gnu-10.2_ncdf49/modulefiles/stack module load hpc/1.2.0 module load hpc-gnu/10.2 module load mpich/3.3.2 module load hpc-mpich/3.3.2

Gaea miniconda:: module use /lustre/f2/dev/role.epic/contrib/modulefiles module load miniconda3/4.12.0

Gaea intel: Lmod initialization on Gaea needs to be done first by sourcing the following script: source/lustre/f2/dev/role.epic/contrib/Lmod_init.sh

module use /lustre/f2/dev/role.epic/contrib/modulefiles module load miniconda3/4.12.0

module use /lustre/f2/dev/role.epic/contrib/hpc-stack/intel-2021.3.0/modulefiles/stack module load hpc/1.2.0 module load intel/2021.3.0 module load hpc-intel/2021.3.0 module load hpc-cray-mpich/7.7.11

Cheyenne miniconda: module use /glade/work/epicufsrt/contrib/miniconda3/modulefiles module load miniconda3/4.12.0

Cheyenne intel: module use /glade/work/epicufsrt/contrib/miniconda3/modulefiles module load miniconda3/4.12.0

module use /glade/work/epicufsrt/contrib/hpc-stack/intel2022.1/modulefiles/stack module load hpc/1.2.0 module load hpc-intel/2022.1 module load hpc-mpt/2.25

Cheyenne gnu: module use /glade/work/epicufsrt/contrib/miniconda3/modulefiles module load miniconda3/4.12.0

module use /glade/work/epicufsrt/contrib/hpc-stack/gnu11.2.0/modulefiles/stack module load hpc/1.2.0 module load hpc-gnu/11.2.0 module load hpc-mpt/2.25

Orion miniconda: module use /work/noaa/epic-ps/role-epic-ps/miniconda3/modulefiles module load miniconda3/4.12.0

Orion intel: module use /work/noaa/epic-ps/role-epic-ps/miniconda3/modulefiles module load miniconda3/4.12.0

module use /work/noaa/epic-ps/role-epic-ps/hpc-stack/libs/intel-2022.1.2/modulefiles/stack module load hpc/1.2.0 module load hpc-intel/2022.1.2 module load hpc-impi/2022.1.2

Jet miniconda: module use /mnt/lfs4/HFIP/hfv3gfs/role.epic/miniconda3/modulefiles module load miniconda3/4.12.0

Jet intel: module use /mnt/lfs4/HFIP/hfv3gfs/role.epic/miniconda3/modulefiles module load miniconda3/4.12.0

module use /mnt/lfs4/HFIP/hfv3gfs/role.epic/hpc-stack/libs/intel-2022.1.2/modulefiles/stack module load hpc/1.2.0 module load hpc-intel/2022.1.2 module load hpc-impi/2022.1.2

NB: There were comments in ufs-weather-app/PR-419 suggesting to roll back to lower compiler versions for Cheyenne gnu (to use 11.2.0 instead of 12.1.0), Hera intel (to use intel/2021.1.2 instead of 2022.2.0), and Jet intel (to use intel/2021.1.2 instead of intel/2022.2.0)

Either way could be OK for the SRW, and the libraries would be built for the lower-version compilers as suggested

natalie-perlin avatar Oct 19 '22 16:10 natalie-perlin

@natalie-perlin Can you make sure all compiler and library versions are confirmed against https://github.com/ufs-community/ufs-weather-model/tree/develop/modulefiles ?

jkbk2004 avatar Oct 20 '22 01:10 jkbk2004

@ulmononian can we coordinate about intel/gnu/openmpi to hera on this issue?

jkbk2004 avatar Oct 20 '22 02:10 jkbk2004

@jkbk2004 The PRs have not been made yet to address the changes in modulefiles for the ufs-weather-model, only for the ufs-srweather-app

natalie-perlin avatar Oct 20 '22 12:10 natalie-perlin

The modulefiles for Hera and Jet to use the intel/2022.1.2 version, and not the latest 2022.2.0, version have been built. Updating the info in the top comment of this issue.

natalie-perlin avatar Oct 20 '22 15:10 natalie-perlin

Can somebody please build the gnu hpc-stack on hera and cheyenne using openmpi. Thanks.

DusanJovic-NOAA avatar Oct 20 '22 16:10 DusanJovic-NOAA

@DusanJovic-NOAA @jkbk2004 here is a build i did in the past w/ gnu-9.2.0 & openmpi-3.1.4 on hera: module use /scratch1/NCEPDEV/stmp2/Cameron.Book/hpcs_work/libs/gnu/stack_noaa/modulefiles/stack

ulmononian avatar Oct 20 '22 17:10 ulmononian

@DusanJovic-NOAA @jkbk2004 here is a build i did in the past w/ gnu-9.2.0 & openmpi-3.1.4 on hera: module use /scratch1/NCEPDEV/stmp2/Cameron.Book/hpcs_work/libs/gnu/stack_noaa/modulefiles/stack

Thanks @ulmononian. I also have the gnu/openmpi stack built in my own space. What I was asking is the installation in officially supported location so that we can update modulefiles in develop branch.

DusanJovic-NOAA avatar Oct 20 '22 18:10 DusanJovic-NOAA

@ulmononian would you please also create an issue hpc-stack on upp repo (https://github.com/noaa-emc/upp). Also other workflow (global workflow, HAFS workflow) may also be impacted by this change. @WenMeng-NOAA @aerorahul @WalterKolczynski-NOAA @KateFriedman-NOAA @BinLiu-NOAA FYI.

junwang-noaa avatar Oct 21 '22 12:10 junwang-noaa

@junwang-noaa @ulmononian @WenMeng-NOAA @aerorahul @WalterKolczynski-NOAA @KateFriedman-NOAA @BinLiu-NOAA @natalie-perlin I noticed that Kyle's old stack installations are still used in other applications and some machines. I started a coordination on EPIC side. It may take a week or two to finish the full transition. I want to combine this issue with the other library update follow-ups on-going: netcdf/esmf, etc.

jkbk2004 avatar Oct 21 '22 12:10 jkbk2004

@jkbk2004 Can you install g2tmpl/1.10.2 for the UPP? Thanks!

WenMeng-NOAA avatar Oct 21 '22 12:10 WenMeng-NOAA

@jkbk2004 Can you install g2tmpl/1.10.2 for the UPP? Thanks!

@WenMeng-NOAA g2tmpl/1.10.2 is available (current ufs-wm modulefiles) but backward comparability issue was captured at issue #1441.

jkbk2004 avatar Oct 21 '22 12:10 jkbk2004

@DusanJovic-NOAA - hpc-stack with gnu/9.2.0+mpich/3.3.2 and gnu/10.2.0+mpich/3.3.2 have been installed on Hera under role.epic account (EPIC-managed space). Testing them with ufs-weather-model-RTs, and plan to include these Hera-gnu into the module updates.

The stack installation locations are: /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/gnu-9.2/ /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/gnu-10.2/

Exact modifications to the modulefiles (paths needed for finding all the modules) will be listed in a subsequent PR(s).

natalie-perlin avatar Oct 21 '22 18:10 natalie-perlin

@DusanJovic-NOAA - hpc-stack with gnu/9.2.0+mpich/3.3.2 and gnu/10.2.0+mpich/3.3.2 have been installed on Hera under role.epic account (EPIC-managed space). Testing them with ufs-weather-model-RTs, and plan to include these Hera-gnu into the module updates.

The stack installation locations are: /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/gnu-9.2/ /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/gnu-10.2/

Exact modifications to the modulefiles (paths needed for finding all the modules) will be listed in a subsequent PR(s).

@natalie-perlin Is anyone going to provide gnu/openmpi stack?

DusanJovic-NOAA avatar Oct 21 '22 19:10 DusanJovic-NOAA

@DusanJovic-NOAA - hpc-stack with gnu/9.2.0+mpich/3.3.2 and gnu/10.2.0+mpich/3.3.2 have been installed on Hera under role.epic account (EPIC-managed space). Testing them with ufs-weather-model-RTs, and plan to include these Hera-gnu into the module updates. The stack installation locations are: /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/gnu-9.2/ /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/gnu-10.2/ Exact modifications to the modulefiles (paths needed for finding all the modules) will be listed in a subsequent PR(s).

@natalie-perlin Is anyone going to provide gnu/openmpi stack?

@ulmononian can you install gnu/openmpi parallel to the location above?

jkbk2004 avatar Oct 21 '22 19:10 jkbk2004

@jkbk2004 - do we need al four possible combinations for compilers (gnu/9.2.0, gnu/10.2.0) with mpich/3.3.2 , openmpi/4.1.2 ?

natalie-perlin avatar Oct 21 '22 22:10 natalie-perlin

@jkbk2004 - do we need al four possible combinations for compilers (gnu/9.2.0, gnu/10.2.0) with mpich/3.3.2 , openmpi/4.1.2 ?

@natalie-perlin I think @ulmononian has installed gnu10.1/openmpi. That should be good enough as a starting point for openmpi option. But it makes a sense to set openmpi installation available along with the role account path.

jkbk2004 avatar Oct 21 '22 22:10 jkbk2004

@jkbk2004, @ulmonian - HPC-modules using different versions gnu, mpich and openmpi were installed, plus new versions of netcdf 4.9.0 (netcdf-c/4.9.0, netcdf-fortran/4.6.0, netcdf-cxx-4.3.1) for the following combinations:

gnu/9.2.0 + mpich/3.3.2 + netcdf/4.7.4 gnu/9.2.0 + mpich/3.3.2 + netcdf/4.9.0 gnu/10.2.0 + mpich/3.3.2 + netcdf/4.7.4 gnu/10.2.0 +mpich/3.3.2 + netcdf/4.9.0 gnu/10.2.0 + openmpi/4.1.2 + netcdf/4.7.4

The updates of the stack locations are made in the top comment of this Issue-1465

natalie-perlin avatar Oct 24 '22 17:10 natalie-perlin

Added a stack build with the intel compiler and netcdf4.9 on Hera (see the list of locations in the top comment)

natalie-perlin avatar Oct 27 '22 16:10 natalie-perlin

@DusanJovic-NOAA @jkbk2004 @natalie-perlin i will install the stack w/ gnu-9.2 and openmpi-3.1.4 here /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs shortly, as well as w/ gnu-10.1 & openmpi-3.1.4 in the official location.

ulmononian avatar Oct 27 '22 17:10 ulmononian

@DusanJovic-NOAA @jkbk2004 @natalie-perlin hpc-stack built w/ gnu-9.2 and openmpi-3.1.4 was installed successfully here: /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/gnu-9.2_openmpi-3.1.4.

ulmononian avatar Oct 28 '22 23:10 ulmononian

I tried running the regression test using gnu-9.2_openmpi-3.1.4 stack but it failed because the debug version of esmf library is missing:

$ module load ufs_hera.gnu_debug
Lmod has detected the following error:  The following module(s) are
unknown: "esmf/8.3.0b09-debug"

Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
  $ module --ignore_cache load "esmf/8.3.0b09-debug"

$ ls -l /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/gnu-9.2_openmpi-3.1.4/modulefiles/mpi/gnu/9.2.0/openmpi/3.1.4/esmf/
total 4
-rw-r--r-- 1 role.epic nems 1365 Oct 28 23:20 8.3.0b09.lua
lrwxrwxrwx 1 role.epic nems   12 Oct 28 23:20 default -> 8.3.0b09.lua

DusanJovic-NOAA avatar Oct 29 '22 14:10 DusanJovic-NOAA

I also tried 'gnu-10.2_openmpi' stack, but it looks like when I load it, it does not actually load gnu 10.2 module, I see:

$ module list

Currently Loaded Modules:
  1) miniconda3/3.7.3   10) libpng/1.6.37  19) g2tmpl/1.10.0
  2) sutils/default     11) hdf5/1.10.6    20) ip/3.3.3
  3) cmake/3.20.1       12) netcdf/4.7.4   21) sp/2.3.3
  4) hpc/1.2.0          13) pio/2.5.7      22) w3emc/2.9.2
  5) hpc-gnu/10.2       14) esmf/8.3.0b09  23) gftl-shared/v1.5.0
  6) openmpi/4.1.2      15) fms/2022.01    24) mapl/2.22.0-esmf-8.3.0b09
  7) hpc-openmpi/4.1.2  16) bacio/2.4.1    25) ufs_common
  8) jasper/2.0.25      17) crtm/2.4.0     26) ufs_hera.gnu
  9) zlib/1.2.11        18) g2/3.4.5

note, there is no gnu/10.2 module loaded. When I run gcc I see the compiler is version 4.8.5:

$ gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

I think this is because, in gnu-10.2_openmpi/modulefiles/core/hpc-gnu/10.2.lua, two lines:

load(compiler)
prereq(compiler)

are missing:

$ cat gnu-10.2_openmpi/modulefiles/core/hpc-gnu/10.2.lua 

...
local compiler = pathJoin("gnu",pkgVersion)

local opt = os.getenv("HPC_OPT") or os.getenv("OPT") or "/opt/modules"
local mpath = pathJoin(opt,"modulefiles/compiler","gnu",pkgVersion)
prepend_path("MODULEPATH", mpath)
...

which are present in:

$ cat gnu-9.2_openmpi-3.1.4/modulefiles/core/hpc-gnu/9.2.0.lua 

...
local compiler = pathJoin("gnu",pkgVersion)
load(compiler)
prereq(compiler)

local opt = os.getenv("HPC_OPT") or os.getenv("OPT") or "/opt/modules"
local mpath = pathJoin(opt,"modulefiles/compiler","gnu",pkgVersion)
prepend_path("MODULEPATH", mpath)
...

DusanJovic-NOAA avatar Oct 29 '22 14:10 DusanJovic-NOAA

There is also unnecessary inconsistency in the naming of hpc-gnu module between two versions:

$ ll gnu-9.2_openmpi-3.1.4/modulefiles/core/hpc-gnu/
total 4
-rw-r--r-- 1 role.epic nems 749 Oct 28 22:07 9.2.0.lua
$ ll gnu-10.2_openmpi/modulefiles/core/hpc-gnu/
total 4
-rw-r--r-- 1 role.epic nems 717 Oct 24 12:59 10.2.lua

Why '10.2' and not '10.2.0'? Also the 9.2 stack directory name has openmpi version, while directory for 10.2 stack does not.

DusanJovic-NOAA avatar Oct 29 '22 14:10 DusanJovic-NOAA

I tried running the regression test using gnu-9.2_openmpi-3.1.4 stack but it failed because the debug version of esmf library is missing:

$ module load ufs_hera.gnu_debug
Lmod has detected the following error:  The following module(s) are
unknown: "esmf/8.3.0b09-debug"

Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
  $ module --ignore_cache load "esmf/8.3.0b09-debug"

$ ls -l /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/gnu-9.2_openmpi-3.1.4/modulefiles/mpi/gnu/9.2.0/openmpi/3.1.4/esmf/
total 4
-rw-r--r-- 1 role.epic nems 1365 Oct 28 23:20 8.3.0b09.lua
lrwxrwxrwx 1 role.epic nems   12 Oct 28 23:20 default -> 8.3.0b09.lua

my apologies, @DusanJovic-NOAA i will install esmf/8.3.0b09-debug in /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/gnu-9.2_openmpi-3.1.4 now and update you when it is finished. we will also address the inconsistency in naming convention and look into the gnu-10.2 modulefile. thank you for testing w/ these stacks.

ulmononian avatar Oct 29 '22 16:10 ulmononian

@DusanJovic-NOAA the stack at /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/gnu-9.2_openmpi-3.1.4 has been updated to include esmf/8.3.0b09-debug. i was able to load ufs_common_debug.lua, so hopefully it works for you now!

ulmononian avatar Oct 29 '22 17:10 ulmononian

@DusanJovic-NOAA, @ulmononian - please note that the GNU 10.2.0 is not installed system-wide on Hera, and only installed locally in EPIC space. It could be build under the current hpc-stack for a particular compiler-gnu-netcdf installation location, but because the compiler is shared between several of such combinations, it is moved to a common location outside a given hpc-stack installation.

Please note that directions to load the compilers and stack given in the first comment address the way the compiler is loaded! For example, Hera gnu/10.2 + mpich/3.3.2 : module use /scratch1/NCEPDEV/nems/role.epic/gnu/modulefiles module load gnu/10.2.0 module use /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/gnu-10.2/modulefiles/stack module load hpc/1.2.0 module load hpc-gnu/10.2 module load mpich/3.3.2 module load hpc-mpich/3.3.2

natalie-perlin avatar Oct 30 '22 09:10 natalie-perlin

The modulefiles for GNU 10.2.0 had to be manually adjusted to allow a customized location of the gnu/10.2.0 compiler, the path that is only listed when the hpc-stack is being requested to load. The stack would not find a compiler "by default", because the modulepath is not known: it neither the system-wide installation path, nor is under the given hpc-stack combo.

I hope it resolves questions about the use of GNU/10.2.0 compiler!

natalie-perlin avatar Oct 30 '22 09:10 natalie-perlin

@DusanJovic-NOAA - as to the questions about the use of 9.2 vs. 9.2.0 or 10.2 vs. 10.2.0 - it is purely by legacy reasons. I did see that previous hpc-stack installations used XX.X abbreviations. However, you do need to give the full version of the compiler, the way it is installed system-wide, which is 9.2.0 in this case. And GNU/10.2.0 was installed in EPIC-space to match the gnu/9.2.0 convention, using XX.X.X. If there is a strong preference to get to the use of XX.X.X (as is system-wide gnu/9.2.0 install), it could relatively easily be done (reinstalled in a new location).

natalie-perlin avatar Oct 30 '22 09:10 natalie-perlin

@DusanJovic-NOAA the stack at /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/gnu-9.2_openmpi-3.1.4 has been updated to include esmf/8.3.0b09-debug. i was able to load ufs_common_debug.lua, so hopefully it works for you now!

@ulmononian Thanks for adding the debug build of esmf. I ran control and control_debug regression tests, both finished successfully. The control tests outputs are not bit identical to the baseline, contol_debug are identical. I guess this is expected due to different MPI library.

DusanJovic-NOAA avatar Oct 31 '22 13:10 DusanJovic-NOAA

@DusanJovic-NOAA, @ulmononian - please note that the GNU 10.2.0 is not installed system-wide on Hera, and only installed locally in EPIC space. It could be build under the current hpc-stack for a particular compiler-gnu-netcdf installation location, but because the compiler is shared between several of such combinations, it is moved to a common location outside a given hpc-stack installation.

Please note that directions to load the compilers and stack given in the first comment address the way the compiler is loaded! For example, Hera gnu/10.2 + mpich/3.3.2 : module use /scratch1/NCEPDEV/nems/role.epic/gnu/modulefiles module load gnu/10.2.0 module use /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/gnu-10.2/modulefiles/stack module load hpc/1.2.0 module load hpc-gnu/10.2 module load mpich/3.3.2 module load hpc-mpich/3.3.2

@natalie-perlin I tried to run control and control_debug tests after loading gnu module form the location above (thanks for explaining this, I missed that in the description). The control test compiled successfuly, but failed at run time:

+ sleep 1                                                                                                                            
+ srun --label -n 160 ./fv3.exe                                                                                                      
  1: [h12c01:06674] OPAL ERROR: Unreachable in file ../../../../../opal/mca/pmix/pmix3x/pmix3x_client.c at line 112                  
 90: [h20c56:12037] OPAL ERROR: Unreachable in file ../../../../../opal/mca/pmix/pmix3x/pmix3x_client.c at line 112                  
 55: [h12c04:153910] OPAL ERROR: Unreachable in file ../../../../../opal/mca/pmix/pmix3x/pmix3x_client.c at line 112                 
144: [h21c53:84991] OPAL ERROR: Unreachable in file ../../../../../opal/mca/pmix/pmix3x/pmix3x_client.c at line 112                  
....
 38: [h12c01:06711] OPAL ERROR: Unreachable in file ../../../../../opal/mca/pmix/pmix3x/pmix3x_client.c at line 112                  
 43: --------------------------------------------------------------------------                                                      
 43: The application appears to have been direct launched using "srun",                                                              
 43: but OMPI was not built with SLURM's PMI support and therefore cannot                                                            
 43: execute. There are several options for building PMI support under                                                               
 43: SLURM, depending upon the SLURM version you are using:                                                                          
 43:                                                                                                                                 
 43:   version 16.05 or later: you can use SLURM's PMIx support. This                                                                
 43:   requires that you configure and build SLURM --with-pmix.                                                                      
 43:                                                                                                                                 
 43:   Versions earlier than 16.05: you must use either SLURM's PMI-1 or                                                             
 43:   PMI-2 support. SLURM builds PMI-1 by default, or you can manually                                                             
 43:   install PMI-2. You must then build Open MPI using --with-pmi pointing                                                         
 43:   to the SLURM PMI library location.                                                                                            
 43:                                                                                                                                 
 43: Please configure as appropriate and try again.                                                                                  

DusanJovic-NOAA avatar Oct 31 '22 13:10 DusanJovic-NOAA