WRF icon indicating copy to clipboard operation
WRF copied to clipboard

Remove unneeded redundant HDF5 linking

Open islas opened this issue 1 year ago • 11 comments

TYPE: bug fix

KEYWORDS: hdf5, relink, duplicate, netCDF

SOURCE: internal

DESCRIPTION OF CHANGES: Problem: HDF5 linking is handled separately by HDF5(configure)/sw_hdf5_path(Config.pl) and is not required anymore. Even if netCDF has HDF5 compression, those libraries should be provided via nc-config --libs or nf-config equivalent. This default injection of HDF5 causes compilation to fail in environments that don't have HDF5 loaded in path, but would otherwise build fine (ex. Derecho, 32 (serial), module load gcc netcdf).

How does netCDF find HDF5 if it has HDF5 compression but no library is listed now? nc-config when used in its shared library format should dictate which HDF5 to use - https://github.com/Unidata/netcdf-c/issues/1257#issuecomment-466648619. This is important to follow so as to avoid library splicing or overlinking if conflicting HDF5 libraries are specified. Note that this is accomplished via RPATH

For static builds we would use --libs --static - Unidata/netcdf-c#1383. In that case one should provide their own locations for HDF5 and other dependent libraries.

This all likely should be addressed for zlib AND most likely conflicts with specifying HDF5(configure) / HDF5_PATH(configure) / sw_hdf5_path(Config.pl). In the latter case, users probably have either (1) a difficult time building and are maybe not even using the HDF5 they expect, (2) can't build as desired, or (3) have non-HDF5 enabled netCDF to allow supplying their own HDF5 usage within WRF. Solving that issue is outside the scope of this bug and would require discussion of all the methods to supply alternate libraries and their respective benefits/costs.

Solution: Remove non-configurable use of sw_hdf5/$(HDF5)

LIST OF MODIFIED FILES: list of changed files (use git diff --name-status master to get formatted list) M arch/Config.pl M arch/preamble

TESTS CONDUCTED:

  1. On Derecho after module purge, compile with module load gcc netcdf and 32 (serial) should configure properly
  2. Are the Jenkins tests all passing? Waiting

RELEASE NOTE: Remove default HDF5 library linking in lieu of netCDF-informed HDF5 linking

islas avatar Nov 08 '23 20:11 islas

The regression test results:

Test Type              | Expected  | Received |  Failed
= = = = = = = = = = = = = = = = = = = = = = = =  = = = =
Number of Tests        : 23           24
Number of Builds       : 60           57
Number of Simulations  : 158           150        0
Number of Comparisons  : 95           86        0

Failed Simulations are: 
None
Which comparisons are not bit-for-bit: 
None

weiwangncar avatar Dec 06 '23 05:12 weiwangncar

The regression test results after resolving the conflict:

Test Type              | Expected  | Received |  Failed
= = = = = = = = = = = = = = = = = = = = = = = =  = = = =
Number of Tests        : 23           24
Number of Builds       : 60           57
Number of Simulations  : 158           150        0
Number of Comparisons  : 95           86        0

Failed Simulations are: 
None
Which comparisons are not bit-for-bit: 
None

weiwangncar avatar Dec 12 '23 01:12 weiwangncar

@islas Perhaps there have been changes to the modules on derecho, or perhaps my process was different, but I didn't find any issues in compiling with option 32 on derecho with the current release-v4.5.2 branch (without the changes in this PR). It didn't look to me like the libraries specified by the HDF5 variable were being included in the final linking steps for the WRF executables.

I seem to recall that there is an HDF5 implementation of WRF's I/O API, and it may be that the HDF5 logic in the build system was put in place for that purpose (and, speculating, perhaps it was later co-opted for NetCDF4 purposes?). If we intend to continue support for the HDF5 option (independent of NetCDF4), then we may want to consider carefully how we modify the logic for HDF5. On the other hand, perhaps we decide that we won't support or maintain the HDF5 implementation of the I/O API moving forward.

mgduda avatar Dec 19 '23 20:12 mgduda

@mgduda These changes were made independent of #1923. Testing with current develop/master/ or where release-v4.5.2 branches from develop should demonstrate the issues in the test/nc4_test.log logfile. As that PR was merged in first, I think it resolved the build issues with forced HDF5 linkage since NETCDF4_DEP_LIB now comes directly from nc-config --libs instead of broken down and reconstructed to specify $(HDF5) $(ZLIB) $(GPFS) $(CURL) manually, with default values being injected.

As that PR removes the in-situ modification of specifying HDF5 libraries for use with netCDF, I think it might still be worth considering this PR since this (the HDF5 linking code in question) is more or less dead code now.

It does look like the HDF5 I/O API assumes to use the NETCDF4_DEP_LIB (here). Whether one was co-opted for the other or vice versa (I think the dep_libs variable (netCDF or not) just became a catch-all for dependencies), the current logic would not have been ideal when using both netCDF and HDF5 as it could potentially load different library versions. The take away here I think should be that the current system already assumed sourcing HDF5 from netCDF only (this may have been overlooked on my part when reviewing #1923) - if there is a use case reinstate support for HDF5 independent of netCDF then the current logic [prior to PR 1923] would only have sufficed in that context and not when both are present, and thus would need to have been revisited / rewritten regardless.

My overarching question is then - should this PR be about removing the traces of hazardous in-situ linkage of HDF5 via co-opting netCDF dependencies (which no longer works), or repairing linkage to HDF5 I/O API when netCDF is not present in such a manner that it does play nicely when it is?

I think both are perfectly valid PRs and push in the right direction, though I view the former as an incremental step to the latter, especially in light of #1923.

islas avatar Dec 19 '23 22:12 islas

@islas @mgduda Should we leave it for 4.6 consideration?

weiwangncar avatar Dec 20 '23 02:12 weiwangncar

@weiwangncar I think that is fine

islas avatar Dec 20 '23 18:12 islas

@islas @mgduda Should we leave it for 4.6 consideration?

I think that seems reasonable. It may turn out that we remove even the HDF5 I/O API code, which seems to be in an unmaintained and untested state.

mgduda avatar Dec 20 '23 21:12 mgduda

@islas I've changed the base branch for this PR to develop. Could you rebase the branch so that only the intended commits are included?

mgduda avatar Dec 21 '23 22:12 mgduda

@mgduda ready for approval?

weiwangncar avatar Jan 24 '24 17:01 weiwangncar

@mgduda @islas I missed this one. Can you again review this, M.?

weiwangncar avatar Feb 01 '24 17:02 weiwangncar

If I understand correctly, WRFDA sometimes requires HDF5 even outside of its use in the WRF I/O API; see the installation instructions. Would this PR affect compilation of WRFDA with HDF5 if the NetCDF libraries were build without HDF5 support?

mgduda avatar Feb 12 '24 23:02 mgduda