amrex
amrex copied to clipboard
Third party library support compatible with the AMReX build systems
Building and effective management of third party libraries (TPL) and how they interact with the AMReX build systems is currently unclear, at least to me.
For PeleLM, we need to be able to manage a set of TPL builds compatible with the build systems of AMReX. Presently for PeleLM, the libraries include SUNDIALS, SUITEPARSE and HYPRE. Like AMReX, the TPL builds need to distinguish and allow correct linking to libraries compiled for all the hardware and system variations, and need to correctly deal with path/LD_LIBRARY_PATH issues, and pass hardware/system information, where applicable down to the TPL build scripts. In addition, we need separate control of build options for the TPLs themselves, such as whether SUNDIALS is built with the ability to call the KLU solver in SuiteSparse, or to select a certain version of the SUNDIALS distribution, in addition to responding to whether USE_CUDA=TRUE or FALSE. Finally, some of the build options collapse, in that for USE_MPI=TRUE, the SUNDIALS libraries need to be built serial, and for USE_OMP=TRUE, they need to be built without direct OMP support.
There needs to be a simple process to create the TPLs for a particular build configuration, to clean and/or rebuild them as necessary, and the process needs to be manageable for new users unfamiliar with the AMReX build systems. Spack is NOT currently an acceptable solution due to the complexity for setting up one's system up to manage this level of flexibility. CMake could be a solution, provided there was a mechanism in place to generate and update the necessary files throughout, e.g. the PeleLM, PelePhysics, IAMR, PeleC and amrex repos, and to automatically incorporate changes that are made frequently in the amrex cmake system, such as those required to incorporate new features (e.g. HIP or One API) or new architectures or changing file names.
For a concrete example, it would be good to get GPU and non-GPU versions of PeleLM compiled and running on Godzilla, Garuda, Cori and Summit, with and without SUNDIALS, and when SUNDIALS is supported, allow with and without KLU support, in DEBUG and non-DEBUG versions. It would be extremely helpful to have this capability be coupled to CI, so that we can automate the testing across platforms with travis and/or NERSC CI.
The priority of this is high...the combustion team is trying to port PeleLM to GPUs to meet an upcoming milestone and this will allow us to tests the various combinations to ensure a correct port. There is a strategy for all this currently in place, but it is brittle and unreliable.
It would be good to break this into smaller steps.
Can you share a very specific example of what went from working to broken due to recent amrex changes? Are the problems only with USE_CUDA = TRUE or also with pure CPU?
On Tue, Jun 16, 2020 at 4:57 PM Marc Day [email protected] wrote:
Building and effective management of third party libraries (TPL) and how they interact with the AMReX build systems is currently unclear, at least to me.
For PeleLM, we need to be able to manage a set of TPL builds compatible with the build systems of AMReX. Presently for PeleLM, the libraries include SUNDIALS, SUITEPARSE and HYPRE. Like AMReX, the TPL builds need to distinguish and allow correct linking to libraries compiled for all the hardware and system variations, and need to correctly deal with path/LD_LIBRARY_PATH issues, and pass hardware/system information, where applicable down to the TPL build scripts. In addition, we need separate control of build options for the TPLs themselves, such as whether SUNDIALS is built with the ability to call the KLU solver in SuiteSparse, or to select a certain version of the SUNDIALS distribution, in addition to responding to whether USE_CUDA=TRUE or FALSE. Finally, some of the build options collapse, in that for USE_MPI=TRUE, the SUNDIALS libraries need to be built serial, and for USE_OMP=TRUE, they need to be built without direct OMP support.
There needs to be a simple process to create the TPLs for a particular build configuration, to clean and/or rebuild them as necessary, and the process needs to be manageable for new users unfamiliar with the AMReX build systems. Spack is NOT currently an acceptable solution due to the complexity for setting up one's system up to manage this level of flexibility. CMake could be a solution, provided there was a mechanism in place to generate and update the necessary files throughout, e.g. the PeleLM, PelePhysics, IAMR, PeleC and amrex repos, and to automatically incorporate changes that are made frequently in the amrex cmake system, such as those required to incorporate new features (e.g. HIP or One API) or new architectures or changing file names.
For a concrete example, it would be good to get GPU and non-GPU versions of PeleLM compiled and running on Godzilla, Garuda, Cori and Summit, with and without SUNDIALS, and when SUNDIALS is supported, allow with and without KLU support, in DEBUG and non-DEBUG versions. It would be extremely helpful to have this capability be coupled to CI, so that we can automate the testing across platforms with travis and/or NERSC CI.
The priority of this is high...the combustion team is trying to port PeleLM to GPUs to meet an upcoming milestone and this will allow us to tests the various combinations to ensure a correct port. There is a strategy for all this currently in place, but it is brittle and unreliable.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/AMReX-Codes/amrex/issues/1030, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACRE6YT6EL3DERCTKNM4UZTRXABFHANCNFSM4OACSJEQ .
-- Ann Almgren Senior Scientist; CCSE Group Lead
Are you compiling SUNDIALS as a library first and then use it in AMReX build system or having AMReX build system to compile SUNDIALS files? The latter might be hard.
@asalmgren No I can't. This has not worked yet. The problems are as written, with the ability to create TPLs compatible with AMReX builds, and then linking to them.
@WeiqunZhang I am trying any way I can to manage the process. I have a system now that is the AMReX make system calling an external makefile, passing options.
Recursive make is hard. It seems to me it would be easier if you compile TPLs separately using they systems and then use them as a library. You might build multiple versions and add some make flags in PeleLM makefile to control which version of the libraries gets used.
I believe Nyx does what Weiqun suggested -- builds AMReX and SUNDIALS separately -- Jean can you confirm?
On Tue, Jun 16, 2020 at 5:34 PM WeiqunZhang [email protected] wrote:
Recursive make is hard. It seems to me it would be easier if you compile TPLs separately using they systems and then use them as a library. You might build multiple versions and add some make flags in PeleLM makefile to control which version of the libraries gets used.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AMReX-Codes/amrex/issues/1030#issuecomment-645079767, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACRE6YTFBMJ3H3X47ILY5CDRXAFQTANCNFSM4OACSJEQ .
-- Ann Almgren Senior Scientist; CCSE Group Lead
If I do that, then to make two different versions to allow debug comparison testing, I need to make two completely separate copies of the 4 repos and 3 TPLs and make sure all the run files are identical. My goal is as if running MPI vs no-MPI, where the a pair executables can be build with USE_MPI=TRUE and USE_MPI=FALSE, and they can live in the same folder and use the same files. Moreover, if I want to check for asserts, I need to make a third set of the 4 repos and 3TPLS to see if DEBUG=TRUE uncovers anything. I can do this now. I'm searching for a way to avoid doing that, and the potential confusion and mistakes that it would lead to.
Just for clarity -- is this something that used to work but got broken recently, or an "aspirational" build system?
On Tue, Jun 16, 2020 at 5:51 PM Marc Day [email protected] wrote:
If I do that, then to make two different versions to allow debug comparison testing, I need to make two completely separate copies of the 4 repos and 3 TPLs and make sure all the run files are identical. My goal is as if running MPI vs no-MPI, where the a pair executables can be build with USE_MPI=TRUE and USE_MPI=FALSE, and they can live in the same folder and use the same files. Moreover, if I want to check for asserts, I need to make a third set of the 4 repos and 3TPLS to see if DEBUG=TRUE uncovers anything. I can do this now. I'm searching for a way to avoid doing that, and the potential confusion and mistakes that it would lead to.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AMReX-Codes/amrex/issues/1030#issuecomment-645083754, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACRE6YQNCGHYXFA3LFFP2UTRXAHPRANCNFSM4OACSJEQ .
-- Ann Almgren Senior Scientist; CCSE Group Lead
Nyx does what Weiqun mentioned for SUNDIALS, where we use cmake to compile sundials libraries, and link against them when using gnumake to create a Nyx executable. For the on-the-fly halo-finding, we link against 2 TPL using gnumake to create a Nyx executable (there's a post-processing version on plotfiles that uses an amrex cmake built library to make a TPL executable using the TPL's cmake). The ascent/conduit compile path currently uses those libraries built with cmake, and uses gnumake to create a Nyx executable. For robustness it would be nice if there were a clean/trustworthy way to make sure that the correct performance of different TPL doesn't depend too heavily on MPI+ OMP vs MPI +CUDA configurations, or on particular cuda or compiler choice.
"aspirational"?? It's not something that used to work. It is the result of working with our collaborators and trying to incorporate their work into ours, as they recommend. In this particular example, Nyx does not need something like KLU because their "matrix" is tiny. CASTRO doesn't need KLU because Don wrote an in-place inversion for his systems.
More generally, it is not crazy to think other may want to link more than one TPL to their code and may find the suggested strategy a bit cumbersome.
Castro doesn't need KLU but we do need Hypre for the radiation code, and I have struggled with this in the past. I too would like it to be the case (both in the CMake and GNUMake build systems) that there was a well-defined path for obtaining and building a dependency like Hypre that ensures it is built with the same compiler settings we're building with the application. Otherwise we end up doing what Marc has to do now, manually manage that in several different application make systems.
But yes, @asalmgren this is a capability I am requesting that does not yet exist for a problem that has arisen recently in trying to use existing tools to migrate to a more complex workflow suggested by our collaborators. If this is not technically an "issue" I will close this ticket and seek other routes, but I thought the creative minds here might have a solution I hadn't thought of, or be able to throw together something that addresses my needs but that may also benefit other AMReX applications (such as Nyx :) )
I concur that tracking which compiler version things are built with and updating/comparing things appropriately is non-trivial. I don't necessarily think having a mega-build where amrex builds all the libraries will work, especially for those which are quite finicky (and take more than an hour to build). But cmake being able to use compile flags from other cmake library builds is useful.
Maybe you can use a python script to control this. The script will have a number of options. It will translate them to the flags each library needs. It builds each TPL separately, and then compile PeleC and AMReX. Each time the script is called, it can remove all previous builds first. With ccache
, this should not have a big performance impact. Note that the latest ccache works with CUDA.
@jmsexton03 That is a strategy I've seen work, where make dumps out a set of options that are then mined by other builds. I've seen it specific kinda the other way around, where it dumps the specific commands to used to link their libs into an app, but it's a good idea to think about whether we can cache a set of build options that can be parsed by a glue script that manages sub-makes (using whatever build system that comes with that lib...even cmake...ick).
@WeiqunZhang That's kinda what I have, in a sense, right now. The icky part is translating options for driving CMake, etc. and just managing the build/install folders. Also, in our current make system, you generate a buildSuffix that captures all the variations we intend to support, and that defines the TPL variations we'd need, but that keeps growing/changing. Maybe there's a strategy where there is a leaf makefile template (or templates, one for each type of build system we encounter) that acts as the "glue" or go-between to drive auxiliary builds, once we find a way to abstract out the variations in a clever way.
@drummerdoc -- this workshop https://mail.google.com/mail/u/0/?tab=cm#inbox/FMfcgxwHNqMgxJNBtsdpcvzQMTzDrSnL might be relevant for PeleLM. They claim "The workshop will assist exascale code developers in learning how to resolve issues outside of their control and provide guidance on writing a build system generator capable of seamlessly configuring for multiple unique architectures with a variety of compilers." "Seamlessly configuring" sounds like what you want!
I admit that it would be useful to be more up on such tools.
Also, with input from Weiqun, I now have a GNUmake-based system for the current Issue that functions reasonably, and will be working to make it more clean and generic so that other AMReX users can consider it for similar purposes. I will close this issue when I can provide a direct link to my solution in a generic AMReX-based context for others to look over, and then will welcome input and improvements....(even if they come in the form of CMake).