WRF icon indicating copy to clipboard operation
WRF copied to clipboard

New feature: ADIOS2 High Performance Parallel I/O, In-situ Analysis

Open MichaelLaufer opened this issue 3 years ago • 4 comments

Optional high-performance parallel-I/O option added to WRF, enabling in-situ analysis using the ADIOS2 library.

TYPE: new feature

KEYWORDS: ADIOS2, In-situ, I/O, Parallel, High-performance

SOURCE: Michael Laufer (Toga Networks, a Huawei Company) Erick Fredj, (The Jerusalem College of Technology, Rutgers University, Toga Networks, a Huawei Company) Joseph Brodie (Rutgers University) Lori Garzio (Rutgers University)

DESCRIPTION OF CHANGES: This PR adds ADIOS2 as a high-performance parallel I/O backend option to WRF. ADIOS2 is a data management library that can transport groups of self-describing variables and attributes across different media types (such as files, wide-area-networks, and remote direct memory access). See here for additional information on ADIOS2. Our testing demonstrates an order of magnitude performance increase over PnetCDF and can even outperform split NetCDF (io_form_* 102) at scale without the need to merge the files back together, while attaining compression ratios similar to NetCDF4 compression. ADIOS2 additionally supports in-situ processing and code-coupling without requiring data to traverse via the file system. A complete description of this new I/O backend, as well as some of the new features now accessible in WRF, can be found in our research article (https://arxiv.org/abs/2201.08228). A conversion script converting the ADIOS2 file format to the NetCDF4 file format is also provided for backward compatibility. Details on configuring and running should be found in the file doc/README.adios2.

LIST OF MODIFIED FILES: M Registry/Registry.CONVERT M Registry/Registry.EM_COMMON M Registry/Registry.EM_COMMON.var M Registry/registry.io_boilerplate M arch/Config.pl M arch/postamble M arch/preamble M configure M external/Makefile A external/io_adios2/Makefile A external/io_adios2/ext_adios2_get_dom_ti.code A external/io_adios2/ext_adios2_get_var_td.code A external/io_adios2/ext_adios2_get_var_ti.code A external/io_adios2/ext_adios2_put_dom_ti.code A external/io_adios2/ext_adios2_put_var_td.code A external/io_adios2/ext_adios2_put_var_ti.code A external/io_adios2/field_routines.F90 A external/io_adios2/transpose.code A external/io_adios2/wrf_io.F90 M external/ioapi_share/wrf_status_codes.h M frame/md_calls.m4 M frame/module_io.F M share/module_io_domain.F M share/output_wrf.F M share/wrf_ext_write_field.F

TESTS CONDUCTED:

  • Gnu and Intel compilers were used for testing with dmpar and dm+sm. The serial configuration is not applicable, as MPI is required for this implementation.
  • Restarting WRF via the ADIOS2 restart file works properly. The data in subsequent output files matches the data in NetCDF output files.

RELEASE NOTE: ADIOS2 has been added to WRF as an optional high-performance I/O backend. File write times are substantially lower than PnetCDF, with compression ratios close to NetCDF4. Allows for new in-situ processing and code-coupling capabilities through the ADIOS2 library.

MichaelLaufer avatar Feb 03 '22 10:02 MichaelLaufer

Test Type              | Expected  | Received |  Failed
= = = = = = = = = = = = = = = = = = = = = = = =  = = = =
Number of Tests        : 23           24
Number of Builds       : 60           58
Number of Simulations  : 158           156        0
Number of Comparisons  : 95           92        0

Failed Simulations are: 
None
Which comparisons are not bit-for-bit: 
None

weiwangncar avatar Feb 08 '22 03:02 weiwangncar

Hi, the ADIOS2 dev team have released a new version of the library (v2.8.0), which includes the needed Fortran APIs for our implementation, as well as further performance improvements, so the ADIOS2 master branch requirement is no longer required. The doc/README.adios2 has been updated accordingly.

To streamline ADIOS2 installation for testing this, I would recommend using Spack. The following should assist reviewers/testers setting up a simplified testing environment:

# Install dependencies through spack
spack install --only dependencies wrf
spack install --reuse [email protected]

# Hotfix for netcdf-c and netcdf-fortran being separated
NETCDF=$(spack location -i netcdf-fortran)
NETCDF_C=$(spack location -i netcdf-c)
ln -sf $NETCDF_C/include/*  $NETCDF/include/
ln -sf $NETCDF_C/lib/*  $NETCDF/lib/

# Load dependencies (MPI will be loaded implicitly)
spack load netcdf-c netcdf-fortran parallel-netcdf jasper libpng adios2

# Set environment variables
export NETCDF=$(spack location -i netcdf-fortran)
export PNETCDF=$(spack location -i parallel-netcdf)
export ADIOS2=$(spack location -i adios2)
export JASPERINC=$(spack location -i jasper)/include
export JASPERLIB=$(spack location -i jasper)/lib

Now configure and compile WRF as normal.

Lastly, I am the primary Spack maintainer for WRF, so assuming this makes it into the next WRF release, we will ensure that a single command will be able to compile a fully optimized binary, including the optional ADIOS2 I/O backend:

spack install wrf +adios2

We are happy to assist in pushing this along in any way. Thanks

MichaelLaufer avatar Apr 10 '22 15:04 MichaelLaufer

@weiwangncar @vlakshmanan-scala I see that the CI is failing, but I am unable to see the details of the failure. Can you provide access or post the results here? I am a bit puzzled as to why this would fail, as the only thing that has changed since the successful checks is the documentation :thinking:.

MichaelLaufer avatar Apr 24 '22 20:04 MichaelLaufer

@MichaelLaufer The failed reg test is unlikely your problem. We will run the test again soon.

weiwangncar avatar Apr 25 '22 16:04 weiwangncar

@MichaelLaufer Hi Michael I am trying to do something similar recently, I followed https://github.com/wrf-model/WRF/pull/1666#issuecomment-1094299905 but got this error during "./compire em_real" (I configured with option 32(GNU (gfortran/gcc) serial and basic nesting), and I have gcc version 9.4.0)

/lib/cpp -P -nostdinc -DEM_CORE=1 -DNMM_CORE=0 -DNMM_MAX_DIM=2600 -DDA_CORE=0 -DWRFPLUS=0 -DIWORDSIZE=4 -DDWORDSIZE=8 -DRWORDSIZE=4 -DLWORDSIZE=4 -DNONSTANDARD_SYSTEM_SUBR  -DWRF_USE_CLM -DUSE_NETCDF4_FEATURES -DWRFIO_NCD_LARGE_FILE_SUPPORT -DRPC_TYPES=1  -DDM_PARALLEL -DSTUBMPI -DNETCDF -DADIOS2 -DLANDREAD_STUB=1 -DUSE_ALLOCATABLES -Dwrfmodel -DGRIB1 -DINTIO -DKEEP_INT_AROUND -DLIMIT_ARGS -DBUILD_RRTMG_FAST=0 -DBUILD_RRTMK=0 -DBUILD_SBM_FAST=1 -DSHOW_ALL_VARS_USED=0 -DCONFIG_BUF_LEN=65536 -DMAX_DOMAINS_F=21 -DMAX_HISTORY=25 -DNMM_NEST=0 -P -traditional-cpp -DUSE_NETCDF4_FEATURES -DWRFIO_NCD_LARGE_FILE_SUPPORT  -I/home/fengggli/software/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/adios2-2.8.0-e7qsdjlr2rvxdzm4al3bcrprmkeb6kgh/include/adios2/fortran -I../ioapi_share field_routines.F90 > field_routines.f
time gfortran   -O2 -ftree-vectorize -funroll-loops -w -ffree-form -ffree-line-length-none -fconvert=big-endian -frecord-marker=4    -I/home/fengggli/software/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/adios2-2.8.0-e7qsdjlr2rvxdzm4al3bcrprmkeb6kgh/include/adios2/fortran -I../ioapi_share -c field_routines.f
field_routines.f:18:6:

   18 |   use wrf_data_adios2
      |      1
Fatal Error: Cannot open module file ‘wrf_data_adios2.mod’ for reading at (1): No such file or directory
compilation terminated.
Command exited with non-zero status 1

Is it possible some files are missing, or did I miss any steps? I tried both your master and development branch, but got the same error above. If you have updated instructions somewhere, I am also glad to try it out!

Best, Feng

fengggli avatar Nov 03 '22 15:11 fengggli

FYI, I have worked with @fengggli to resolve this. The current PR assumes that ADIOS2 libs are in a /lib64 directory. On most systems it is /lib. I will push a fix for this shortly.

MichaelLaufer avatar Nov 15 '22 21:11 MichaelLaufer

We typically prefer to have PRs made from unique, well-named branches; see (3) in the Creating a Branch for Development Work section in the Workflow for WRF Code Modification wiki page.

Since there are already a few suggested changes in the discussion of this PR, it might make the most sense to address those requests in this PR, and once that's done, to create a new branch and open a new PR for that branch. We can then reference the new PR from here, and close this PR.

mgduda avatar Nov 17 '22 23:11 mgduda

@mgduda @fengggli Changes have been pushed addressing all topics requested. I have opened #1787, which is now rebased on current developement branch and is pushed from a more descriptive branch name.

@mgduda Please close this PR or instruct me to close it as when required. Thanks.

MichaelLaufer avatar Nov 20 '22 14:11 MichaelLaufer

This PR has been superseded by PR #1787.

mgduda avatar Nov 22 '22 00:11 mgduda