FMS icon indicating copy to clipboard operation
FMS copied to clipboard

adding support for PIO to FMS for better I/O performance

Open edhartnett opened this issue 6 years ago • 12 comments

PIO (https://github.com/NCAR/ParallelIO) is a HPC I/O library that has been used in CESM at NCAR, and has recently become an output option for WRF.

PIO allows the same code to use netCDF classic, netCDF-4 parallel, netCDF-4 sequential/compressed, or pnetcdf for input/output, on a file-by-file basis. It also allows designation of an arbitrary number of I/O processors to handle all I/O. It will also allow use of new technologies being added to the netCDF C library, like zarr, and to the HDF5 library, including some other forms of cloud storage. Applications like FMS will not have to change their code to use these new features.

PIO users can easily switch from netcdf classic to pnetcdf to HDF5, to zarr - this could be a run-time decision, or changed at compile-time by changing the mode flag in nf_create/nf_open. Furthermore, PIO allows codes to scaled to thousands of processors or more, while still providing the performance of parallel I/O from a much more reasonable number of I/O processors.

Right now I am finishing a PIO-netCDF integration project. PIO will become available to existing netCDF code (C or Fortran), via the use of a mode flag in nc_create/nc_open. (Some code changes are required to set up the I/O system, define data decomposition, and do reads/writes of the distributed arrays.) The netCDF-PIO integration will be available for the next released versions of netcdf-c and PIO.

Using this integrated code, I can convert the FMS code without changing most netCDF calls. I see that you already have a data decomposition scheme and I will have to map this on to PIO's data decomposition functions.

In order to execute these changes, of course I need the mpp directory to be fully tested, so I will add tests to cover untested code, until we can all be confident that no changes will break existing code.

My plan is to submit these changes as a pull request to FMS. I hope that you will consider it for merging to master, once I have demonstrated its value.

I hope that PIO will allow users to easily try out a variety of different strategies, and use what works best in each case.

edhartnett avatar Jul 23 '19 16:07 edhartnett

Could I suggest that you integrate this into the with-parallel-netcdf branch? It is working in the ocean cases that we tested, has a proven performance benefit at the ~1500 MPI rank/300 Lustre OST regime, has been written up in a manuscript, and has already been submitted as a PR. Since (AFAIK) PIO is heavily based on parallel netCDF, it should share a lot of common code.

If it's possible, I'd also request that you retain this native parallel netCDF support alongside a potential PIO implementation.

marshallward avatar Jul 25 '19 16:07 marshallward

@marshallward I am going to examine your branch closely.

However, the way forward is not for me to integrate more code to an unmerged branch, it is to get your branch merged to master and then for me to work with that.

It is not my intention to remove any capability, but to add PIO as another option.

PIO now uses existing netCDF fortran code. So it will not be necessary for me to change most of the netCDF calls in FMS, in order to enable PIO.

I can also imagine a future iteration of the FMS code, where output I/O is not a build-time decision, but a run-time decision. But right now this seems to be determined at build-time by the setting of various macros.

I am happy to help your get your branch merged, with respect to any build system issues. If it is submitted as a PR, is there a reason it has not been merged?

edhartnett avatar Jul 25 '19 16:07 edhartnett

I think the issue was an ongoing rewrite of the FMS IO framework (on an internal build, I believe), and there was no simple way to integrate the PR into this version. But I expect that some FMS folks can explain this better than myself, hopefully they will chime in.

marshallward avatar Jul 25 '19 16:07 marshallward

Incremental improvement is the way forward. For that we need tests. Once tests are in place, rewrites don't occur - instead refactoring can occur, which is incremental, fast, and tested.

I would like to work on this FMS I/O code. I need to first start by writing tests. Is there some group of FMS programmers with a bunch of changes to the I/O code? Can those changes be merged? Hopefully those changes were developed with tests.

edhartnett avatar Jul 25 '19 16:07 edhartnett

Here is a diagram about PIO which helps to explain its capabilities: I_O_on_Many_ Async

edhartnett avatar Jul 26 '19 01:07 edhartnett

Hi @edhartnett sorry for not getting back sooner, I was out all afternoon. The plan that you've outlined sounds great.

Re: the PIO diagram, FMS has something like an I/O processor, but rather it's a designated MPI rank inside a subdomain of MPI ranks. For example, every Nth MPI rank could be responsible for the I/O of N MPI ranks over each subdomain, though more general Cartesian divisions are possible.

I believe there was a plan to explore an IO server implementation (XIOS, Met Office UM, etc), and there's been a desire for something like this in MOM6 to handle some of the diagnostic postprocessing and I/O, so it sounds like PIO could help facilitate something like this.

marshallward avatar Jul 26 '19 14:07 marshallward

Yes, I see what FMS is doing, with the designated I/O ranks. PIO delivers that capability in a general way.

One advantage of PIO is that it will allow pass-through access to all the features added to the netcdf-c library. NF_PIO just becomes another mode flag pass to nf_create(). Anything else the netcdf-c library can do is still available via PIO.

This will include both the netcdf team's zarr efforts, and the HDF5 teams DAOS cloud storage work.

edwardhartnett avatar Jul 26 '19 17:07 edwardhartnett

I would really like to start on this, but until we can resolve #98 and #102 its pointless to invest any effort in this untested code.

edwardhartnett avatar Sep 11 '19 12:09 edwardhartnett

Some in our group have been working on updates to the FMS, MPP and IO routines for the libFMS. It is still in testing, and we are still working through bugs we have discovered. I'm not sure how much more work is needed. Perhaps @wrongkindofdoctor, @uramirez8707 or @thomas-robinson could tell you more of the status.

However, that updated work is being done in the fms-io-dev branch. I would suggest you begin with that branch.

underwoo avatar Sep 11 '19 13:09 underwoo

@edhartnett How will using PIO like this in the FMS library work in an executable (CESM) that also has components that use PIO directly?

jedwards4b avatar Jan 22 '20 17:01 jedwards4b

Good question!

edwardhartnett avatar Jan 22 '20 17:01 edwardhartnett

An important question that I think we need to answer before moving forward since both the fv3 atmosphere and the mom ocean are potential cesm components.

jedwards4b avatar Jan 22 '20 17:01 jedwards4b