netcdf-c icon indicating copy to clipboard operation
netcdf-c copied to clipboard

add testing with hdf5-1.14.0?

Open edwardhartnett opened this issue 2 years ago • 8 comments
trafficstars

hdf5-1.14.0 was just released. It contains some great performance improvements for HPC systems using compression! ;-)

For more details on these HDF5 improvements see: https://www.hdfgroup.org/2022/03/parallel-compression-improvements-in-hdf5-1-13-1/

Use of the 1.14.0 release seems to resolve the recently raised issue https://github.com/Unidata/netcdf-fortran/issues/389.

So this is wonderful and we are all very happy here at NOAA. But are you guys testing with hdf5-1.14.0? Seems like it works out of the box but I'm not testing exhaustively...

edwardhartnett avatar Jan 19 '23 14:01 edwardhartnett

Hi Ed, glad to hear HPC folk are seeing performance improvements! We aren't testing against it yet, and trying an out-of-the-box test on MacOS is returning some netCDF compilation errors (using a clang-based non-parallel build of hdf5 1.14.0), so I'll need to investigate that first. We'll get to it sooner than later, although I'm hoping to get v4.9.1 out shortly!

WardF avatar Jan 19 '23 22:01 WardF

We are seeing two 1.14 HDF5-related issues with NetCDF parallel.

The first relates to an assert being triggered in HDF5 by the parallel NetCDF tests, https://github.com/HDFGroup/hdf5/issues/2433 If I remove the assert all the NetCDF tests pass with HDF 1.14.0.

The second relates to a hang when MPI_Info_set is set to romio_no_indep_rw "true", https://github.com/HDFGroup/hdf5/issues/2434

brtnfld avatar Feb 07 '23 21:02 brtnfld

@brtnfld what is the assert line?

edwardhartnett avatar Feb 08 '23 08:02 edwardhartnett

nc_create_par will fail in HDF5 with assertion: ../../src/H5Fio.c:397: H5F_shared_vector_write: Assertion `types[i] != H5FD_MEM_GHEAP' failed.

brtnfld avatar Feb 08 '23 15:02 brtnfld

I cannot reproduce this problem on my linux workstation.

Are you using mpich or openmpi?

You are building HDF5 with --enable-parallel and netcdf-c with --enable-parallel-tests?

And you are seeing a failure in the netcdf-c parallel test? Have you tried the HDF5 tests?

edwardhartnett avatar Feb 18 '23 09:02 edwardhartnett

mpich 4.0.2

Yes, HDF5 with --enable-parallel and ../configure --disable-byterange --enable-parallel-tests --enable-logging --prefix=${PREFIX} --enable-cdf5 --enable-netcdf-4 --enable-parallel4

It was a NetCDF-c test:

nc_create_par will fail assertion: ../../src/H5Fio.c:397: H5F_shared_vector_write: Assertion `types[i] != H5FD_MEM_GHEAP' failed.

It does not fail any HDF5 tests, but we have a reproducer and a fix at:

https://github.com/HDFGroup/hdf5/pull/2480

brtnfld avatar Feb 20 '23 15:02 brtnfld

OK, that's a quick fix from the HDF5 team! Good work!

Meanwhile, should we also remove this assert from netCDF? Since the HDF5 fix won't come out until the next release, and we want to continue working with older versions of HDF5?

edwardhartnett avatar Feb 25 '23 13:02 edwardhartnett

It is only 1.14.0. The older versions will not have this issue since it was only introduced in 1.13.2. What do you mean by the assert in netCDF? It was only an assert in HDF5.

brtnfld avatar Feb 27 '23 14:02 brtnfld

OK, this is working now. I will close this issue.

edwardhartnett avatar Aug 16 '24 11:08 edwardhartnett