netcdf-c icon indicating copy to clipboard operation
netcdf-c copied to clipboard

Support SWMR in HDF5

Open eivindlm opened this issue 6 years ago • 21 comments

This PR is an attempt to add support for the SWMR feature found in HDF5 >= 1.10.

This should allow concurrent access to an hdf5-file from multiple processes, as long as there is a single writer process [1,2]. The usecase is quite common, where a simulation service is writing results to a file, and one or more "readers" wish to get the latest results from the same file. Without swmr-support, this usecase may lead to corrupt files. There have been some requests about this feature in the past [3,4].

The PR is not complete, but it would be very helpful with some feedback at this point. First of all, is SWMR-support something you would like to include in netcdf-c?

According to the SWMR programming model, there are only minor changes required in netcdf-c: a) Create the file with H5F_LIBVER_LATEST b) Calling H5F_start_swmr_write in the writer process c) Open the file with flag H5F_ACC_SWMR_READ in the reader process d) Calling H5Dflush regularly in the writer process

Looking forward to hear your opinion about this.

[1] https://support.hdfgroup.org/HDF5/docNewFeatures/SWMR/HDF5_SWMR_Users_Guide.pdf [2] https://support.hdfgroup.org/HDF5/Tutor/swmr.html [3] https://github.com/Unidata/netcdf4-python/issues/862 [4] https://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg13717.html

eivindlm avatar Jul 31 '19 14:07 eivindlm

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Jul 31 '19 14:07 CLAassistant

Current status:

  • I have protected a,b, and c with a new flag NC_HDF5_SWMR. I reused a deprecated hex value for the flag, but the value must probably change.
  • I have added a call to H5Dflush after every variable write, but this should also be wrapped inside a conditional on SWMR-mode.
  • I have implemented a test for sequential access by a reader and a writer inside the same process, but the test should instead spawn a separate reader and writer in order to test concurrency.

eivindlm avatar Jul 31 '19 14:07 eivindlm

My first thought is that performance needs to be checked with and without this. I am particularly concerned with the flushes. This would eleiminate any benefit from buffering writes...

edwardhartnett avatar Jul 31 '19 14:07 edwardhartnett

I should probably remove the flushing from NC4_put_vars, and leave the responsibility of flushing to the user. A user opening a file in swmr-mode, should make sure to call sync_netcdf4_file (or another suitable function) regularly instead.

eivindlm avatar Jul 31 '19 14:07 eivindlm

I think the idea is good, in general.

Is there a way for this to be always-on without breaking anything else? In other words, does this have to be a mode flag - could it just be applied to every file?

edwardhartnett avatar Jul 31 '19 15:07 edwardhartnett

I would suggest that you build with --enable-benchmarks, and you can very easily see any performance impact.

Another question is how does this interact with parallel I/O, if at all?

edwardhartnett avatar Jul 31 '19 15:07 edwardhartnett

@edhartnett

Another question is how does this interact with parallel I/O, if at all?

SWMR does not work with a parallel writer or reader. Currently it is a serial writer, serial reader(s) only capability. However, I do see merit in supporting this as it is very confusing for people to not be able to look at their files as they are being written if in NetCDF4 format when they have had no issues doing this for NetCDF3 files...

Hopefully HDF5 group will get parallel-writer, serial reader SWMR working at some point. I think it is planned for 2020 or 2021...

gsjaardema avatar Jul 31 '19 16:07 gsjaardema

I am thinking about the interactions of this with our internal metadata (i.e. netcdf metadata, not HDF5 metadata). One question concerns R/W to variables that have one or more unlimited dimensions. I think that we track the max size of the unlimited dimension while HDF5 tracks its size for each use in a variable [Ed correct me if I have this wrong]. So, there might be an issue when doing a write increases the size of an unlimited dimension.

DennisHeimbigner avatar Jul 31 '19 17:07 DennisHeimbigner

I believe we do not keep track of the maximum extent of unlimited vars. When we need to know, we check at that time.

edwardhartnett avatar Jul 31 '19 18:07 edwardhartnett

Another, perhaps even more important, feature of this is that it will allow the user to get the best HDF5 performance.

For many users, inability to be read by 1.8 version of HDF5 would not be a big problem. The 1.10 series has been out for years and years now. Not many people should be stuck on 1.8.

In light of that, perhaps the mode flag should be something like NC_HDF5_LATEST.

edwardhartnett avatar Jul 31 '19 18:07 edwardhartnett

Seems to be failing a test called tst_zero_len_var.sh.

edwardhartnett avatar Jul 31 '19 18:07 edwardhartnett

@edhartnett Agree that performance is (sometimes) better with 1.10.X, but there are also many instances where 1.8 is faster which is some of the reason why 1.8.X is still under development. THG is working on finding the performance regressions in 1.10.

Also, there are several clients still using 1.8 (Paraview is/was until latest release) and many systems still ship with 1.8 libraries. I wish this weren't true but have tracked down many issues due to 1.10.X files not being usable in downstream applications.

But, 1.10.X gives some opportunities for vastly improving parallel performance and I definitely advocate for the use of 1.10.X; just some caveats in doing so.

gsjaardema avatar Jul 31 '19 20:07 gsjaardema

Support for HDF5 1.10.x features that live in the free feature set (which should be most of them, as we've learned with discussions recently) can be added, and PR's like this are great! Thank you very much. The issue would be requiring the 1.10.x branch for any netCDF version. Any new functionality which depends upon a particular version of HDF5 beyond the minimum required version (1.8.9 or 1.8.12 at the moment, I'd need to check) requires fence-posting during configuration, so that the features are disabled when they aren't available.

I certainly support increasing the minimum required version, but I agree with @gsjaardema that there are still too many areas where 1.10.x is not on par with 1.8.x; we aren't ready to tell our community that they must switch.

WardF avatar Jul 31 '19 20:07 WardF

Thank you for all the positive response! If I then have understood correctly, I should add ifdefs such that the swmr-code is not compiled with hdf 1.8. And probably remove the call to flush after variable writes (but then I will need some help to expose H5DFlush to the user). Then run benchmarks. Am I on the right track?

eivindlm avatar Aug 02 '19 08:08 eivindlm

I think this would be good to merge, if it passes tests, which looks like they need to be re-run...

edwardhartnett avatar Apr 19 '22 20:04 edwardhartnett

A couple of things:

  1. Is ENABLE_HDF5_SWMR test/set in configure.ac? I do not see that it was changed.
  2. It would be nice if when ENABLE_HDF5_SWMR is set, the HDF5 version is also tested.

DennisHeimbigner avatar Apr 19 '22 20:04 DennisHeimbigner

I haven't reviewed this yet, but it would need to be rebased against the current main branch. Also, @DennisHeimbigner is correct that we would need to fencepost this in configure.ac and CMakeLists.txt such that this functionality is appropriately disabled if/when linking against older versions of libhdf5 that don't support SWMR.

WardF avatar Apr 21 '22 20:04 WardF

My only real objection is that it used up another mode flag.

DennisHeimbigner avatar Apr 21 '22 20:04 DennisHeimbigner

This would be a really useful feature. What's needed to get it over the line, just the option flag adding to configure.ac? I'm happy to look at adding that

ZedThree avatar Mar 06 '23 10:03 ZedThree

I can confirm this would be a really useful feature, especially in HPC (i.e. supercomputer) applications.

If many multiple processes can each open a file for reading, that would be a tremendous savings in complexity.

edwardhartnett avatar Mar 06 '23 11:03 edwardhartnett

I've merged in main, fixed up the conflicts and stuff that's changed (e.g. HAVE_H5PSET_LIBVER_BOUNDS macro has been removed), fixed a bug, and I've added --enable-hdf5-swmr to configure.ac. I obviously can't push to this branch, so would you prefer a PR into this branch, or a whole new PR into main?

There's also some subtlety in how to use SWMR -- all the groups and variables must be defined before enabling it. With this PR, the only mechanism for enabling SWMR is through the file open flag NC_HDF5_SWMR, so in order to use this feature properly, one must create the file and all the variables you wish to modify, then close and reopen it with that flag. That's probably fine, but it would need to be documented.

Another option might be to not pass the flag to HDF5, but call H5Fstart_swmr_write in nc_enddef perhaps. I'm less sure how that would work exactly, there'd need to be some mechanism to tell nc_enddef to enable SWMR.

Also, H5Fstart_swmr_write conflicts with opening the file with H5F_ACC_SWMR_WRITE, and I couldn't find an obvious way to check if a file has been opened in SWMR mode, so that needs to be handled carefully.

ZedThree avatar Mar 06 '23 18:03 ZedThree