TriBITS
TriBITS copied to clipboard
Fix TPL link ordering when all TPL libs currently exist on link line
Problems have occured for a long time with TriBITS related to library link order for TPLs for static libs (where the order has to be correct). The ultimate problem is that TriBITS TPLs don't know about what dependencies they have on each other. For example, if your package only needs basic PETSc functionality, then it should only depend on the PETSc TPL. But if PETSc is built against SuperLU and HYPRE, then just linking against libpetsc.a will result in link failures. That is a very difficult problem to resolve in general because you need a pretty sophisticated system to automatically figure this out when static libraries are involved. This part of the general problem will be addressed by another story, not this story.
The focus of this Story is to address the use cases for TPL library link order that TriBITS technically has enough information to get right but currently messes up. Here is an example where TriBITS currently fails. Say that PackageA depends on TPL_A, PackageB depends on TPL_B, PackageB depends on PackageA, and TPL_B depends on TPL_A. In this case, PackageA would list the dependency on TPL_A. Now, since PackageB only depends on the functionality of TPL_B, it should only have to list the dependency on TPL_B, right? The problem is that currently with TriBITS, a package needs to list all of the TPLs that it is dependent on for static libraries. This was not a big deal when the depth of TPLs in Trilinos was basically LAPACK and BLAS. So to fix this, the Dependencies.cmake file for PackageB would need to list TPL_B and TPL_A (even through PackageB has no direct dependence on TPL_A). If it does not, then the current TriBITS implementation would provide the library link order:
libpackageb.a libpackagea.a libtpla.a libtplb.a
This results in a link failure because libtplb.a must be listed before libtpla.a. This is what is likely happening, for example, in link order problems with NetCDF and HDF5 in trilinos/Trilinos#156. This is also what is likely happening in trilinos/xSDKTrilinos#2.
The reason for this bad behavior is the way that TriBITS currently manages the link libraries of packages and TPLs. What it currently does is it lumps the include directories and link libraries for the TPLs for a given package in with the include directories and link libraries for the package itself. Then, in a downstream package, it first adds the libraries for upstream packages then the TPLs for that package. This is obviously a bad implementation.
In the past, we have hacked this in one of two different ways:
- Have a package list all of the TPLs that are required for correct static linking. That is what was done with BLAS and LAPACK in Trilinos from the very beginning.
- List all the libraries needed for a given TPL. For example, for the NetCDF libraries, you list the libraries netcdf, hdf5, and z for the NetCDF TPL. This must be done on a system-by-system basic. This is what is being done for ATTBDevEnv.cmake (see trilinos/Trilinos#172) for example.
This Story will be to fix the TPL include and link order in cases where all of the correct TPL libraries are already being listed on the link line but in the wrong order. That can be done with some simple tweaks to the data-structures and algorithms used by TriBITS.
Later stories will address the general problem of listing all the static libraries needed for correct linking.
Would this also be fixed by implementing #63 ?
Would this also be fixed by implementing #63 ?
Yes, the TPL ordering issue would likely get fixed automatically as part of #63, but that is a much larger refactoring. This story would be a much faster and easier refactoring than #63. One of the goals of #63 is to eliminate a bunch of TPL-specific code so that Packages and TPLs are handled more uniformly (less TriBITS code, less unit tests, less to maintain, etc.). But if that does not work out as smoothly as I hope, then TPL handling may need to be kept separate and hence this issue will need to be resolved on its own anyway. At some point, TPLs have to be handled differently in order to properly mark their include directories with INCLUDE_DIRECTORIES(SYSTEM ...)
to avoid warnings from these header files (see).
Note that if Ninja works out (and there are still several issues with that), then there is less of a need to implement some of the use cases in #63 from a development productivity perspective (but it still makes sense from a deployment productivity perspective). As a matter of fact, keeping it one large CMake/TriBITS project will actually be more productive (if the configure times are sufficiently fast and code upstream packages are not being changed as often).