GEOS icon indicating copy to clipboard operation
GEOS copied to clipboard

Semi-Automated explicit instantiation of templated functions in separate compilation units

Open rrsettgast opened this issue 1 year ago • 11 comments

What is the requested feature? We have various functions that get called after going through a series of "pass thru" template, which are essentially a large manual if/elseif/.../else block (edit...not true any longer. It is now a recursive function call with an if check to see if if the cast succeeded...see ConstitutivePassThruHandler below ) for calling compile time instantiations of functions from a runtime interface. This is all well and good, except that it results in huge compilation units that compile all permutations of the pass-thru. This has two problems: 1) long compile times that are not able to use parallelism, 2) large memory requirements for these bloated compilation units.

example:

template< typename LAMBDA >
void constitutiveUpdatePassThru( MultiFluidBase const & fluid,
                                 LAMBDA && lambda )
{
  ConstitutivePassThruHandler< DeadOilFluid,
                               BlackOilFluid,
#ifdef GEOSX_USE_PVTPackage
                               CompositionalMultiphaseFluid,
#endif
                               CO2BrinePhillipsFluid,
                               CO2BrineEzrokhiFluid,
                               CO2BrinePhillipsThermalFluid /*, // if I uncomment the two models at the same time, the compiler segfaults on
                                                               Lassen!
                                                               CO2BrineEzrokhiThermalFluid*/>::execute( fluid, std::forward< LAMBDA >( lambda ) );
}

and

template< typename TYPE, typename ... TYPES >
struct ConstitutivePassThruHandler< TYPE, TYPES... >
{
  template< typename BASE, typename LAMBDA >
  static void execute( BASE & relation, LAMBDA && lambda )
  {
    using Derived = add_const_if_t< TYPE, std::is_const< BASE >::value >;

    if( dynamicCast< Derived * >( &relation ) )
    {
      lambda( static_cast< Derived & >( relation ) );
    }
    else
    {
      ConstitutivePassThruHandler< TYPES... >::execute( relation, std::forward< LAMBDA >( lambda ) );
    }
  }
};

Ideally, the lambda function will call another function that is templated on the strong type of fluid that is passed to execute, which is passed to lambda eventually. For instance, `PVTDriver.cpp has :

  constitutiveUpdatePassThru( baseFluid, [&] ( auto & selectedFluid )
  {
    using FLUID_TYPE = TYPEOFREF( selectedFluid );
    runTest< FLUID_TYPE >( selectedFluid, m_table );
  } );

We need to have cmake create a bunch of compilation units that look like:

#include "PVTDriverRunTest.hpp"
#include "CompositionalMultiphaseFluid.hpp"
namespace geosx
{
template void PVTDriver::runTest< constitutive::CompositionalMultiphaseFluid >( constitutive::CompositionalMultiphaseFluid &, arrayView2d< real64 > const & );
}

Is your request related to a specific problem? Long compilation times, Large memory footprints failing compilation on CUDA machines

Describe the solution you'd like We can make a semi-automated process to generate compilation units for each instantiation. The proposed process is as follows:

  1. We take the templated function/s and place them in either a single header, or a collection of stand-alone headers. In the example above, this is the contents of PVTDriverRunTest.hpp.
  2. We creates a cmake list of classnames/filenames that contain the various template parameters in the pass-thru function
  3. We create a "template file" that is the pattern for the compilation units.
  4. We use cmake to use substitution and generate the compilation units....of course we add the generated compilation units to the list of sources for the module we are in.

Describe alternatives you've considered JITC

When assigned, developer should coordinate with @klevzoff and @rrsettgast

rrsettgast avatar Aug 01 '22 23:08 rrsettgast

it results in huge compilation units that compile all permutations of the pass-thru.

I do have a question about this.

The ConstitutivePassThruHandler basically being a long if/else switch, what happens if you write the equivalent if/else code by hand? Does your compilation succeed? Is it significantly faster, lighter?

TotoGaz avatar Aug 02 '22 00:08 TotoGaz

@TotoGaz My statement was out of date. @klevzoff replaced the manual if/elseif/else blocks with a recursive function. I don't think it would have impact. It is really the compilation of all the instantiations on in the same compilation unit, which won't change with the manual block.

rrsettgast avatar Aug 02 '22 16:08 rrsettgast

@rrsettgast So if we are sure that we need to split the compilation unit, then I would tend not to involve an additional player (cmake) in the game and try to keep is pure C++.

Would it be possible to create the C++ macro which would generate the content of your "cmake template file"? Eventually, any new "product registration" would need the creation of the new file by hand (once in a while), and the proper call to the macro.

One the cons, a little more work by hand w.r.t. cmake, on the pros, we never leave the C++ world.

I'm a bit uncomfortable defining some class names in our cmake files (because this is what will hapen, won't it?). I'm sure it will be easier for anyone to "discover" the pattern and maintain it.

TotoGaz avatar Aug 04 '22 22:08 TotoGaz

For what it's worth, here's how Trilinios/Tpetra is solving explicit instantiation problem. Theirs is a bit more difficult, since they need to instantiate for arbitrary combinations of Scalar/Local/Global/Policy types that are configured by the user. Not saying it's the most elegant solution, but clearly it works for them.

  1. A source file that provides a class or function template also defines a macro that will perform the instantiation with a given set of parameters: https://github.com/trilinos/Trilinos/blob/171b69ae0d36c078cde3912ae6ef2ee667019652/packages/tpetra/core/src/Tpetra_BlockCrsMatrix_def.hpp#L3405-L3406
  2. A "generic" translation unit "template" is stored in the repository that simply invokes any macro name substituted into it with parameters also substituted into it: https://github.com/trilinos/Trilinos/blob/master/packages/tpetra/core/src/Tpetra_ETI_SC_LO_GO_NT.tmpl
  3. A CMake function iterates over enabled parameter sets and performs the substitution of macro name and parameters and generates translation units: https://github.com/trilinos/Trilinos/blob/171b69ae0d36c078cde3912ae6ef2ee667019652/packages/tpetra/core/src/CMakeLists.txt#L292
  4. The CMake function is called for a particular class template: https://github.com/trilinos/Trilinos/blob/171b69ae0d36c078cde3912ae6ef2ee667019652/packages/tpetra/core/src/CMakeLists.txt#L708-L710

I think this is very close to what @rrsettgast proposed, with one extra layer of "genericity" due to an additional macro being involved that helps avoid writing a new "template" translation unit for every function/class whose instantiations we want to split. Or maybe this was included in the proposal and I didn't catch it.

klevzoff avatar Aug 05 '22 20:08 klevzoff

Yes, I think it's more or less what @rrsettgast proposed. My question is to know whether or not we need some cmake involved or not in GEOSX for this case.

TotoGaz avatar Aug 05 '22 20:08 TotoGaz

My question is to know whether or not we need some cmake involved or not in GEOSX for this case.

No, I don't think we can avoid involving CMake. We would have to manually stamp out every single separate translation unit by hand, as @rrsettgast has done for PVTDriver prior to merging the raja update PR. I think this issue has been opened exactly because that approach is not desirable. Other than adding translation units by hand, CMake is the only thing that can add files into the build. A C++ language-level macro cannot do this.

klevzoff avatar Aug 05 '22 21:08 klevzoff

CMake is the only thing that can add files into the build

Wouldn't collecting the compilation units in their own folder and telling cmake to take them all be working?

TotoGaz avatar Aug 05 '22 22:08 TotoGaz

Wouldn't collecting the compilation units in their own folder and telling cmake to take them all be working?

Yes, but someone needs to generate compilation units first, i.e. all of the

PVTDriverRunTestCO2BrineEzrokhiFluid.cpp
PVTDriverRunTestCO2BrinePhillipsFluid.cpp
PVTDriverRunTestDeadOilFluid.cpp
...

Currently they've been created manually. A developer adding a new fluid model has to create one of these files and add it to CMakeLists.txt by hand, or they will get a linker error. We're saying that we'd rather have CMake do this for us automatically.

klevzoff avatar Aug 05 '22 22:08 klevzoff

CMake is the only thing that can add files into the build

Wouldn't collecting the compilation units in their own folder and telling cmake to take them all be working?

If I understand correctly, I think this is what should be done. Just place all the separated compilation units into a folder. We would probably manually create the subfolder, and templated function header/s. Then we can have cmake populate the directory with all the compilation units/instantiations it has been instructed to create. I thought it would be fine as the way I had done it in the RAJA update PR knowing that we are certainly going to organize a little better when we address this properly. I think it needs to be pretty soon that we do this.

rrsettgast avatar Aug 05 '22 23:08 rrsettgast

So we can have a cmake "free" solution like

fluid
├── PVTDriverRunTest.hpp
└── impl
    ├── PVTDriverRunTestMacro.hpp
    ├── PVTDriverRunTestCO2BrineEzrokhiFluid.cpp
    ├── PVTDriverRunTestCO2BrinePhillipsFluid.cpp
    └── PVTDriverRunTestDeadOilFluid.cpp

where PVTDriverRunTest*.cpp are written by hand and use PVTDriverRunTestMacro.hpp to do the declaration. Then CMakeLists.txt could use some file GLOB to insert all the cpp files of the impl folder into the build process without further action.

Or a cmake version like:

fluid
├── PVTDriverRunTest.hpp
└── impl
    └── PVTDriverRunTest.cpp.tmpl

and a cmake function that generates the PVTDriverRunTest*.cpp based on the content of template file PVTDriverRunTest.cpp.tmpl and a list of classes provided in the CMakeLists.txt. (BTW I'm not sure we need a impl folder there anymore to be honest: generated files should not be in the source tree, but in the build tree... But we need to be sure there is no collision...).

TotoGaz avatar Aug 06 '22 01:08 TotoGaz

I would say that the directory structure, if any, would be something like

fluid
├── PVTDriverRunTest.hpp
└── PVTDriverRunTest_impl
    ├── PVTDriverRunTestMacro.hpp
    ├── PVTDriverRunTestCO2BrineEzrokhiFluid.cpp
    ├── PVTDriverRunTestCO2BrinePhillipsFluid.cpp
    └── PVTDriverRunTestDeadOilFluid.cpp

as there will be several of these in fluid, and throwing everything into an impl directory might get messy?

rrsettgast avatar Aug 11 '22 00:08 rrsettgast