netcdf-c
netcdf-c copied to clipboard
Support SWMR in HDF5 (updated)
I've taken the work @eivindlm started in #1448, and done the following:
- merged in recent
main
and fixed conflicts - removed use of deleted macro
- added
--enable-hdf5-swmr
option toconfigure.ac
- changed the use of
add_definition
in CMake to defining a macro inconfig.h
, to be consistent with autotools version - fixed a bug where
H5Pset_libver_bounds
wasn't set when opening a file - replaced calls to
H5Pstart_swmr_write
with file access flags- this was required to get the tests to pass
This feature needs some documentation, as it's not quite straightforward to use. Where's best to put that?
Should this also have a feature flag set in netcdf_meta.h
, and in nc-config
?
Given that this feature needs explicitly enabling at configure time, should it be added to an existing/new CI build?
Awesome work!
Definitely add a flag to netcdf_meta/nc-config for this!
Ok, I've added NC_HAS_HDF5_SWMR
to netcdf_meta.h
and --has-hdf5-swmr
to nc-config
HDF5 has lots of features that are not supported in netcdf-c. In order to include such features, we need evidence that it is high demand.
@DennisHeimbigner SWMR is a very useful feature for HPC users. It is frequently the case that a model is writing results and other processes want to see those results. To calculate the next timestep, all processors need some data from the previous timestep. The simplest solution would be for those reading processes to read the file and get what they want, but without SWMR it will not work.
A NOAA group just asked about this recently, so I believe it is a feature that will be used by multiple modeling groups, There's also a good chance that PIO would allow access to this features, and PIO is used in several important models at NOAA and NCAR.
@gsjaardema what do you think about SWMR?
The inability to watch netCDF4 files while they are being written is the main complaint I get from researchers when upgrading applications and libraries. Using SWMR removes that limitation. This is a pretty basic QoL feature that would benefit a lot of users.
It would be even better it this feature could be enabled and used completely automatically, but it's not clear if that's even possible.
@gsjaardema what do you think about SWMR?
I agree with @ZedThree that many users want to be able to query the HDF5 file while it is being written and the inability to do that is a big complaint. Especially when it works for netcdf-3 and netcdf-5 but not netcdf-4. Explaining that to users is confusing.
At one time, SWMR did not work with parallel files; I don't remember if that has changed yet. We would at a minimum need to support parallel writing with serial read; parallel write and parallel read would be nice. Serial write/serial read which SWMR definitely supports is also useful for us though.
There is also a question of overhead -- does use of SWMR entail any performance hit on the writing. If no or minimal, then I could see using it at all times; if there is a performance hit, then we would probably need it to be turned on/off on a file-by-file basis.
Just to be clear, in case there's any confusion, this is a completely opt-in feature currently, and requires netCDF to be configured and built with explicit support, as well as a runtime flag when opening the file.
I think SWMR is incompatible with parallel files, and only works in serial. I have seen some references to "VFD SWMR", "full SWMR", and "MWMR" (multiple writer, multiple reader), but I'm not sure of the availability of these features.
Hi all, great stuff! I was just wondering if there had been any updates on this work regarding building into the main branch or any other initiatives in NetCDF over the past six months that do something similar? Very keen to be able to read NetCDF4 model result files (in QGIS for example) while a hydrodynamic model is still running. The ability to review results 'on-the-fly' as a model is running is an important part of flood and coastal modelling workflow, and this functionality will provide great value to researchers and practitioners.
replaced calls to H5Pstart_swmr_write with file access flags; this was required to get the tests to pass
Where is this change; I do not see it in the files changed for this PR.
I was just wondering if there had been any updates on this work regarding building into the main branch or any other initiatives in NetCDF over the past six months that do something similar?
It looks like this branch has fallen out of date a little bit, I'll try and fix that up. I don't think there has been any other work on this though.
If you're able, it would be nice to see if you could try out this branch and see if there's any major issues I've missed!
Where is this change; I do not see it in the files changed for this PR.
This is just a difference in implementing this feature between #1448 and this PR, so it doesn't appear in the diff between this branch and main
.
In libhdf5/hdf5open.c
in #1448 there was:
if (mode & NC_WRITE && mode & NC_HDF5_SWMR) {
if ((retval = H5Fstart_swmr_write(h5->hdfid))) {
BAIL(retval);
}
}
and this PR I instead changed the file access flags:
if((mode & NC_HDF5_SWMR)) {
flags |= H5F_ACC_SWMR_WRITE;
}
The former starts SWMR mode after the file as been opened; the latter opens the file immediately in SWMR mode. I don't recall exactly why this made any difference, except that it was required to get the tests to pass.
Hi @ZedThree, (and an early disclaimer here about me being a hack), I've been testing the past few days and have compiled #2652 on top of HDF5 (have completed tests on both HDF5 1.10.11 and 1.14.1-2). For the most part it compiles successfully however tst_filter appears to fall over. See outputs at the bottom of this comments.Any advice is certainly welcome.
Despite tst_filter I've pushed on. Our program uses netcdf-fortran to write results, so to try and test your commits i've built v4.6.1 (https://github.com/Unidata/netcdf-fortran/tree/v4.6.1) on top of #2653. Sadly, I appear to be still getting file lock issues (no doubt user error here...).
Are you aware of anything within the netcdf-fortran libs what will also need to be updated to enabled swmr, or potentially if there are additional flags I should be setting when first opening our result files via nf90_create. Happy to send more information through. Many thanks for your help. Cheers, Mitch.
_======================================================= netCDF 4.9.3-development: nc_test4/test-suite.log
TOTAL: 83
PASS: 82
SKIP: 0
XFAIL: 0
FAIL: 1
XPASS: 0
ERROR: 0
.. contents:: :depth: 2
FAIL: tst_filter
findplugin.sh loaded final HDF5_PLUGIN_DIR=/home/mjs/dev/tuflowfv_netcdf/tuflowfv/tuflow_shared_external/netcdf-c/plugins/.libs filter szip not available *** Testing dynamic filters using API
*** Testing API: bzip2 compression. show parameters for bzip2: level=9 show chunks: chunks=4,4,4,4
*** Testing API: bzip2 decompression. data comparison: |array|=256 no data errors *** Pass: API dynamic filter
*** Testing dynamic filters parameter passing test1: compression. test: nparams=14: params= 1 239 23 65511 27 77 93 1145389056 3287505826 1097305129 1 2147483648 4294967295 4294967295 dimsizes=4,4,4,4 chunksizes=4,4,4,4
mismatch: tint64 mismatch: tuint64 mismatch: tfloat64 test1: decompression. mismatch: tint64 mismatch: tuint64 mismatch: tfloat64 data comparison: |array|=256 no data errors test2: dimsize % chunksize != 0: compress. test: nparams=14: params= 2 239 23 65511 27 77 93 1145389056 3287505826 1097305129 1 2147483648 4294967295 4294967295 dimsizes=4,4,4,4 chunksizes=4,4,4,4 nbytes = 1024 chunk size = 1024 test2: dimsize % chunksize != 0: decompress. nbytes = 1024 chunk size = 1024 data comparison: |array|=256 no data errors test3: dimsize % chunksize != 0: compress. test: nparams=14: params= 3 239 23 65511 27 77 93 1145389056 3287505826 1097305129 1 2147483648 4294967295 4294967295 dimsizes=4,4,4,4 chunksizes=4,4,4,4 test3: error code = 0 test3: dimsize % chunksize != 0: decompress. data comparison: |array|=256 no data errors *** Pass: parameter passing *** Testing dynamic filters using ncgen *** Pass: ncgen dynamic filter *** Testing dynamic filters using nccopy *** Testing simple filter application *** Pass: nccopy simple filter *** Testing '' filter application *** Pass: nccopy '' filter *** Testing 'v&v' filter application *** Pass: nccopy 'v|v' filter *** Testing pass-thru of filters *** Pass: pass-thru of filters *** Testing -F none *** Pass: -F none *** Testing -F var,none *** Pass: -F var,none *** Pass: all nccopy filter tests *** Testing dynamic filters using ncgen with -lc *** Pass: ncgen dynamic filter *** Testing multiple filters
*** Testing Multi-filter application: filter set = bzip2 deflate noop filters verified show chunks: chunks=4,4,4,4 direction=compress id=40000 cd_nelmts=0 cd_values=
*** Testing Multi-filters. filters verified direction=decompress id=40000 cd_nelmts=0 cd_values= data comparison: |array|=256 no data errors *** nccopy -F with multiple filters direction=compress id=40000 cd_nelmts=0 cd_values= *** ncgen with multiple filters *** Pass: multiple filters *** Testing filter re-definition invocation *** Testing multiple filter order of invocation on create 7a8,9
direction=compress id=40000 cd_nelmts=1 cd_values= 0 direction=compress id=40001 cd_nelmts=1 cd_values= 1 11,12d12 < direction=compress id=40000 cd_nelmts=1 cd_values= 0 < direction=compress id=40001 cd_nelmts=1 cd_values= 1_
@Mitch3007 Thanks for taking a look! I'm not sure about the test errors -- it passes everything in CI, and this feature shouldn't interact with the filters at all, so it's possible there's an issue with your environment.
Are you aware of anything within the netcdf-fortran libs what will also need to be updated to enabled swmr, or potentially if there are additional flags I should be setting when first opening our result files via nf90_create.
Ah yes the Fortran API will need the NC_HDF5_SWMR
flag exposing. Currently it uses the same value as the deprecated MPIIO flag, so you should use NF90_MPIIO
. There will need to be a corresponding NF90_HDF5_SWMR
if this gets merged.
There's currently no docs for this feature, but to use it properly, you should first open the file normally and create all the variables and attributes. Then, close and reopen the file in SWMR mode (nc_open(FILE_NAME, NC_WRITE|NC_HDF5_SWMR, &ncid)
) to append your data. At this point, you should be able to open the file for reading in a different process, also with the SWMR flag.
Reading through the HDF5 docs, I realise this PR is actually missing a crucial part, a call to H5Drefresh
. This needs to be called from the reading process to refresh its metadata of a dataset. I'll add this to NC4_get_vars
.
@magnusuMET Yes, I think you're right. Actually, I'm not sure why we can't just sidestep the issue and only call H5Pset_libver_bounds
when enabling SWMR mode.
@magnusuMET You prompted me to look into this properly -- it turns out that since the original PR (#1448) that this one is based on, a new function hdf5set_format_compatibility
was added which sets the libver bounds, and that was being called after the new call to H5Pset_libver_bounds
here with the result that the values would get clobbered anyway.
The bounds that are currently set (in main
) have H5F_LIBVER_LATEST
for the high bound, so we don't have to do anything special for SWMR: simply creating a file in SWMR mode sets the file format to the HDF5 1.10 format (superblock 3).
I've added nc_test4/test_hdf5_swmr.sh
that now properly exercises the SWMR capability. Running this test creates separate processes for simultaneously writing to and reading from the same file. This was very useful because it flagged that we need to refresh the metadata when reading dimension lengths. With that implemented, I think this is now the minimally useful version of the feature.
Things that are now missing:
- testing in CI
- some docs on how to actually use this
- maybe
ncwatch
, the equivalent ofh5watch
for watching files (basicallyncdump
but in SWMR mode)
This is currently gated behind a build-time flag (as well as the runtime flag). Does that still make sense? I notice that the CI now only runs on HDF5 1.10.8+, so if 1.8.x is no longer supported, this feature should always be available (though not always turned on of course!). It would also mean not needing an additional CI job.
@WardF @DennisHeimbigner thoughts on how to proceed from here?
Has anyone tested this under Windows (Visual Studio)?
I don't have access to a Windows dev environment myself unfortunately. When the Windows tests are running on CI, could use that?