GEOS
GEOS copied to clipboard
Semi-Automated explicit instantiation of templated functions in separate compilation units
What is the requested feature?
We have various functions that get called after going through a series of "pass thru" template, which are essentially a large manual if/elseif/.../else
block (edit...not true any longer. It is now a recursive function call with an if check to see if if the cast succeeded...see ConstitutivePassThruHandler
below ) for calling compile time instantiations of functions from a runtime interface. This is all well and good, except that it results in huge compilation units that compile all permutations of the pass-thru. This has two problems: 1) long compile times that are not able to use parallelism, 2) large memory requirements for these bloated compilation units.
example:
template< typename LAMBDA >
void constitutiveUpdatePassThru( MultiFluidBase const & fluid,
LAMBDA && lambda )
{
ConstitutivePassThruHandler< DeadOilFluid,
BlackOilFluid,
#ifdef GEOSX_USE_PVTPackage
CompositionalMultiphaseFluid,
#endif
CO2BrinePhillipsFluid,
CO2BrineEzrokhiFluid,
CO2BrinePhillipsThermalFluid /*, // if I uncomment the two models at the same time, the compiler segfaults on
Lassen!
CO2BrineEzrokhiThermalFluid*/>::execute( fluid, std::forward< LAMBDA >( lambda ) );
}
and
template< typename TYPE, typename ... TYPES >
struct ConstitutivePassThruHandler< TYPE, TYPES... >
{
template< typename BASE, typename LAMBDA >
static void execute( BASE & relation, LAMBDA && lambda )
{
using Derived = add_const_if_t< TYPE, std::is_const< BASE >::value >;
if( dynamicCast< Derived * >( &relation ) )
{
lambda( static_cast< Derived & >( relation ) );
}
else
{
ConstitutivePassThruHandler< TYPES... >::execute( relation, std::forward< LAMBDA >( lambda ) );
}
}
};
Ideally, the lambda
function will call another function that is templated on the strong type of fluid
that is passed to execute
, which is passed to lambda
eventually. For instance, `PVTDriver.cpp has :
constitutiveUpdatePassThru( baseFluid, [&] ( auto & selectedFluid )
{
using FLUID_TYPE = TYPEOFREF( selectedFluid );
runTest< FLUID_TYPE >( selectedFluid, m_table );
} );
We need to have cmake create a bunch of compilation units that look like:
#include "PVTDriverRunTest.hpp"
#include "CompositionalMultiphaseFluid.hpp"
namespace geosx
{
template void PVTDriver::runTest< constitutive::CompositionalMultiphaseFluid >( constitutive::CompositionalMultiphaseFluid &, arrayView2d< real64 > const & );
}
Is your request related to a specific problem? Long compilation times, Large memory footprints failing compilation on CUDA machines
Describe the solution you'd like We can make a semi-automated process to generate compilation units for each instantiation. The proposed process is as follows:
- We take the templated function/s and place them in either a single header, or a collection of stand-alone headers. In the example above, this is the contents of
PVTDriverRunTest.hpp
. - We creates a cmake list of classnames/filenames that contain the various template parameters in the pass-thru function
- We create a "template file" that is the pattern for the compilation units.
- We use cmake to use substitution and generate the compilation units....of course we add the generated compilation units to the list of sources for the module we are in.
Describe alternatives you've considered JITC
When assigned, developer should coordinate with @klevzoff and @rrsettgast
it results in huge compilation units that compile all permutations of the pass-thru.
I do have a question about this.
The ConstitutivePassThruHandler
basically being a long if/else
switch, what happens if you write the equivalent if/else
code by hand?
Does your compilation succeed? Is it significantly faster, lighter?
@TotoGaz My statement was out of date. @klevzoff replaced the manual if/elseif/else
blocks with a recursive function. I don't think it would have impact. It is really the compilation of all the instantiations on in the same compilation unit, which won't change with the manual block.
@rrsettgast So if we are sure that we need to split the compilation unit, then I would tend not to involve an additional player (cmake
) in the game and try to keep is pure C++
.
Would it be possible to create the C++
macro which would generate the content of your "cmake template file"?
Eventually, any new "product registration" would need the creation of the new file by hand (once in a while), and the proper call to the macro.
One the cons, a little more work by hand w.r.t. cmake
, on the pros, we never leave the C++
world.
I'm a bit uncomfortable defining some class
names in our cmake
files (because this is what will hapen, won't it?).
I'm sure it will be easier for anyone to "discover" the pattern and maintain it.
For what it's worth, here's how Trilinios/Tpetra is solving explicit instantiation problem. Theirs is a bit more difficult, since they need to instantiate for arbitrary combinations of Scalar/Local/Global/Policy types that are configured by the user. Not saying it's the most elegant solution, but clearly it works for them.
- A source file that provides a class or function template also defines a macro that will perform the instantiation with a given set of parameters: https://github.com/trilinos/Trilinos/blob/171b69ae0d36c078cde3912ae6ef2ee667019652/packages/tpetra/core/src/Tpetra_BlockCrsMatrix_def.hpp#L3405-L3406
- A "generic" translation unit "template" is stored in the repository that simply invokes any macro name substituted into it with parameters also substituted into it: https://github.com/trilinos/Trilinos/blob/master/packages/tpetra/core/src/Tpetra_ETI_SC_LO_GO_NT.tmpl
- A CMake function iterates over enabled parameter sets and performs the substitution of macro name and parameters and generates translation units: https://github.com/trilinos/Trilinos/blob/171b69ae0d36c078cde3912ae6ef2ee667019652/packages/tpetra/core/src/CMakeLists.txt#L292
- The CMake function is called for a particular class template: https://github.com/trilinos/Trilinos/blob/171b69ae0d36c078cde3912ae6ef2ee667019652/packages/tpetra/core/src/CMakeLists.txt#L708-L710
I think this is very close to what @rrsettgast proposed, with one extra layer of "genericity" due to an additional macro being involved that helps avoid writing a new "template" translation unit for every function/class whose instantiations we want to split. Or maybe this was included in the proposal and I didn't catch it.
Yes, I think it's more or less what @rrsettgast proposed.
My question is to know whether or not we need some cmake
involved or not in GEOSX for this case.
My question is to know whether or not we need some
cmake
involved or not in GEOSX for this case.
No, I don't think we can avoid involving CMake. We would have to manually stamp out every single separate translation unit by hand, as @rrsettgast has done for PVTDriver
prior to merging the raja update PR. I think this issue has been opened exactly because that approach is not desirable. Other than adding translation units by hand, CMake is the only thing that can add files into the build. A C++ language-level macro cannot do this.
CMake is the only thing that can add files into the build
Wouldn't collecting the compilation units in their own folder and telling cmake
to take them all be working?
Wouldn't collecting the compilation units in their own folder and telling
cmake
to take them all be working?
Yes, but someone needs to generate compilation units first, i.e. all of the
PVTDriverRunTestCO2BrineEzrokhiFluid.cpp
PVTDriverRunTestCO2BrinePhillipsFluid.cpp
PVTDriverRunTestDeadOilFluid.cpp
...
Currently they've been created manually. A developer adding a new fluid model has to create one of these files and add it to CMakeLists.txt
by hand, or they will get a linker error. We're saying that we'd rather have CMake do this for us automatically.
CMake is the only thing that can add files into the build
Wouldn't collecting the compilation units in their own folder and telling
cmake
to take them all be working?
If I understand correctly, I think this is what should be done. Just place all the separated compilation units into a folder. We would probably manually create the subfolder, and templated function header/s. Then we can have cmake populate the directory with all the compilation units/instantiations it has been instructed to create. I thought it would be fine as the way I had done it in the RAJA update PR knowing that we are certainly going to organize a little better when we address this properly. I think it needs to be pretty soon that we do this.
So we can have a cmake
"free" solution like
fluid
├── PVTDriverRunTest.hpp
└── impl
├── PVTDriverRunTestMacro.hpp
├── PVTDriverRunTestCO2BrineEzrokhiFluid.cpp
├── PVTDriverRunTestCO2BrinePhillipsFluid.cpp
└── PVTDriverRunTestDeadOilFluid.cpp
where PVTDriverRunTest*.cpp
are written by hand and use PVTDriverRunTestMacro.hpp
to do the declaration.
Then CMakeLists.txt
could use some file GLOB
to insert all the cpp
files of the impl
folder into the build process without further action.
Or a cmake
version like:
fluid
├── PVTDriverRunTest.hpp
└── impl
└── PVTDriverRunTest.cpp.tmpl
and a cmake
function that generates the PVTDriverRunTest*.cpp
based on the content of template file PVTDriverRunTest.cpp.tmpl
and a list of classes provided in the CMakeLists.txt
. (BTW I'm not sure we need a impl
folder there anymore to be honest: generated files should not be in the source tree, but in the build tree... But we need to be sure there is no collision...).
I would say that the directory structure, if any, would be something like
fluid
├── PVTDriverRunTest.hpp
└── PVTDriverRunTest_impl
├── PVTDriverRunTestMacro.hpp
├── PVTDriverRunTestCO2BrineEzrokhiFluid.cpp
├── PVTDriverRunTestCO2BrinePhillipsFluid.cpp
└── PVTDriverRunTestDeadOilFluid.cpp
as there will be several of these in fluid
, and throwing everything into an impl
directory might get messy?