qmcpack
qmcpack copied to clipboard
Improve replacement of HDF5 error handler
Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
hdf_archive replaces the default HDF5 error handler with a nullptr, then replaces it in the destructor. https://github.com/QMCPACK/qmcpack/blob/develop/src/io/hdf/hdf_archive.h#L92
This will segfault if called in a openmp parallel loop and also won't work as intended when multiple hdf_archive objects are constructed.
Describe the solution you'd like A clear and concise description of what you want to happen.
Change error handler when QMCPACK starts up.
Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.
Additional context Add any other context or screenshots about the feature request here.
See #4282 for context
Here's how HighFive temporarily silences HDF5 error messages https://github.com/BlueBrain/HighFive/blob/master/include/highfive/H5Utility.hpp
h5cpp has a singleton error handling class. Don't want to add a singleton, but we might get some ideas there.
https://github.com/ess-dmsc/h5cpp/blob/master/src/h5cpp/error/error.hpp
Since the c hdf5 library seems to be a typical C monolyth its seems to be that we both need to interact with it at the application scope for issues like errror handling and whatever scopes own the files. So two classes are needed. Anything like this obviously doesn't belong in the hdf5_archive class. Needs to belong to an object whose lifetime is inside the qmc_main scope.
it seems to me that this is only aspirationally thread safe (i.e. generally not implemented according to the docs) and global as well there is clearly two contexts to deal with when interacting with the hdf5 c library.
Another wrapper that might give us ideas. https://github.com/steven-varga/h5cpp/blob/master/h5cpp/H5Eall.hpp
I believe the original idea was that this was never called from inside an OMP region. Can we require something similiar going forward? Do we have a use case for I/O inside such a region? If we restrict I/O to the start/end of blocks then there is not a problem. This is such a simplification that we should have very good reasons if we take a different route.
@prckent I think this issue extends beyond openmp. If file a.h5 is opened before file b.h5, but b.h5 is closed (and that hdf_archive object is destroyed) after a.h5, the default error handling will be lost.