GEOS
GEOS copied to clipboard
Tpetra LAI
This PR adds support for Tpetra-based LAI. It provides similar functionality to other existing LA interfaces. It is not enabled by default, meaning the Epetra-based Trilinos interface is still default. The plan is to hopefully get it used and tested and eventually replace the old one.
It uses OpenMP and/or GPU acceleration that is available (via Kokkos) in Trilinos packages:
- Tpetra (matrices/vectors)
- Belos (Krylov solvers)
- Ifpack2 (smoothers)
- MueLu (multigrid)
There is still a host-based matrix copy involved - the issue of copy avoiding is a lot more complex and is not addressed here. For vectors, the copy from LvArray::ArrayView
occurs on device, however there is another host copy that gets allocated to support Tpetra's dual-view semantics. These cannot currently be eliminated due to a fundamental issue - Tpetra requires all its device allocations to be in Unified Memory (most solvers, in particular Krylov and preconditioners, seem to work fine when given non-UM vector views, but direct solver KLU2 fails). Some issues in their tracker dating back 4 years suggest relaxing that requirement may not be easy for them any time soon.
More specifically:
- new classes
TpetraVector
,TpetraMatrix
,TrilinosTpetraPreconditioner
,TrilinosTpetraSolver
andTrilinosTpetraInterface
, with identical capabilities to exising Trilinos/Hypre/Petsc classes - added a "direct" option for
preconditionerType
input - it was mainly useful to troubleshootBlockPreconditioner
(eliminating the possibility of sub-block solvers underperforming) and I decided to leave it in - improved/refactored unit test for vector classes, with a proper test for each of the main API calls
The new interface:
- [x] passes all unit tests, without and with CUDA (though we don't run the latter on Travis)
- [x] passes almost all integrated tests without CUDA, with a few exceptions listed below
- [x] passes almost all single-rank integrated tests with CUDA (unfortunately, I can't run multi-rank tests with CUDA reliably)
Integrated tests that fail:
Test | Status | Reason |
---|---|---|
Hydrofracture_KGD_NodeBased_C3D6 (all) | FAIL RUN | Only works with Epetra |
KGD_ZeroToughness (all) | FAIL_RUN | Only works with Epetra |
KGD_ZeroViscosity (all) | FAIL_RUN | Only works with Epetra |
Sneddon (CUDA only) | FAIL RUN | breakdown in numerical factorization in KLU2, probably related to https://github.com/GEOSX/GEOSX/issues/1083 |
beamBending (CUDA only) | FAIL_RUN | Kokkos::Impl::ParallelReduce< Cuda > requested too much L0 scratch memory during MueLu prolongation construction, new (did not fail earlier, including post-13.0 Trilinos update) |
I also need someone to test the PR (including TPL build) on Lassen for possible build issues, configuring with -D GEOSX_LA_INTERFACE=TrilinosTpetra
. Other than that, this is ready to merge, rebaseline not required.
Related to: https://github.com/GEOSX/thirdPartyLibs/pull/120 https://github.com/GEOSX/LvArray/pull/189 https://github.com/GEOSX/GEOSX_PTP/pull/20 https://github.com/GEOSX/integratedTests/pull/84
On my Ubuntu machine, I am able to run both the two ContactMechanics_SimpleCubes_0[8,9]
integratedTests and they pass their checks.
Do you think is it possible to have more details on the error that you see?
BTW, great PR @klevzoff! Thanks!
Do you think is it possible to have more details on the error that you see?
Are you sure you compiled with GEOSX_LA_INTERFACE=TrilinosTpetra
? Anyway, here's what I get after a few timesteps already done:
Received signal 6: Aborted
** StackTrace of 24 frames **
Frame 1: LvArray::system::stackTraceHandler(int, bool)
Frame 2:
Frame 3: gsignal
Frame 4: abort
Frame 5:
Frame 6:
Frame 7:
Frame 8:
Frame 9: _ZN6Tpetra11Distributor7doPostsIN6Kokkos4ViewIPKdJNS2_11LayoutRightENS2_6DeviceINS2_6OpenMPENS2_9HostSpaceEEENS2_12MemoryTraitsILj0EEEEEENS3_IPdJS6_SA_EEEEENSt9enable_ifIXaasr6Kokkos4Impl7is_viewIT_EE5valuesr6Kokkos4Impl7is_viewIT0_EE5valueEvE4typeERKSH_mRKSI_
Frame 10: _ZN6Tpetra11Distributor15doPostsAndWaitsIN6Kokkos4ViewIPKdJNS2_11LayoutRightENS2_6DeviceINS2_6OpenMPENS2_9HostSpaceEEENS2_12MemoryTraitsILj0EEEEEENS3_IPdJS6_SA_EEEEENSt9enable_ifIXaasr6Kokkos4Impl7is_viewIT_EE5valuesr6Kokkos4Impl7is_viewIT0_EE5valueEvE4typeERKSH_mRKSI_
Frame 11: Tpetra::DistObject<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::doTransferNew(Tpetra::SrcDistObject const&, Tpetra::CombineMode, unsigned long, Kokkos::DualView<int const*, Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, void, void> const&, Kokkos::DualView<int const*, Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, void, void> const&, Kokkos::DualView<int const*, Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, void, void> const&, Kokkos::DualView<int const*, Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, void, void> const&, Tpetra::Distributor&, Tpetra::DistObject<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::ReverseOption, bool, bool)
Frame 12: Tpetra::DistObject<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::doTransfer(Tpetra::SrcDistObject const&, Tpetra::Details::Transfer<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const&, char const*, Tpetra::DistObject<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::ReverseOption, Tpetra::CombineMode, bool)
Frame 13: Tpetra::DistObject<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::doImport(Tpetra::SrcDistObject const&, Tpetra::Import<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const&, Tpetra::CombineMode, bool)
Frame 14: Amesos2::MultiVecAdapter<Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > >::put1dData(Teuchos::ArrayView<double const> const&, unsigned long, Teuchos::Ptr<Tpetra::Map<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const>, Amesos2::EDistribution)
Frame 15: Amesos2::Util::put_1d_data_helper<Amesos2::MultiVecAdapter<Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > >, double>::do_put(Teuchos::Ptr<Amesos2::MultiVecAdapter<Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > > > const&, Teuchos::ArrayView<double> const&, unsigned long, Amesos2::EDistribution, long long)
Frame 16: Amesos2::KLU2<Tpetra::CrsMatrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >, Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > >::solve_impl(Teuchos::Ptr<Amesos2::MultiVecAdapter<Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > > >, Teuchos::Ptr<Amesos2::MultiVecAdapter<Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > > const>) const
Frame 17: Amesos2::SolverCore<Amesos2::KLU2, Tpetra::CrsMatrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >, Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > >::solve(Teuchos::Ptr<Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > >, Teuchos::Ptr<Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const>) const
Frame 18: Amesos2::SolverCore<Amesos2::KLU2, Tpetra::CrsMatrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >, Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > >::solve()
Frame 19: geosx::TrilinosTpetraSolver::solve_direct(geosx::TpetraMatrix&, geosx::TpetraVector&, geosx::TpetraVector&)
Frame 20: geosx::TrilinosTpetraSolver::solve(geosx::TpetraMatrix&, geosx::TpetraVector&, geosx::TpetraVector&, geosx::DofManager const*)
Frame 21: geosx::SolverBase::SolveSystem(geosx::DofManager const&, geosx::TpetraMatrix&, geosx::TpetraVector&, geosx::TpetraVector&)
Frame 22: geosx::LagrangianContactSolver::SolveSystem(geosx::DofManager const&, geosx::TpetraMatrix&, geosx::TpetraVector&, geosx::TpetraVector&)
Frame 23: geosx::LagrangianContactSolver::NonlinearImplicitStep(double const&, double const&, int, geosx::DomainPartition&)
Frame 24: geosx::LagrangianContactSolver::SolverStep(double const&, double const&, int, geosx::DomainPartition&)
And the full log: ContactMechanics_SimpleCubes_08.data.txt
I will run it in debug later when I get a chance and hopefully find out what that error actually means.
Yes, I did a clean build with -DGEOSX_LA_INTERFACE=TrilinosTpetra
.
On my Ubuntu machine, I am able to run both the two
ContactMechanics_SimpleCubes_0[8,9]
integratedTests and they pass their checks.
Turns out they pass for me too if I run each test separately, but fail when I run the whole test suite (and oversubscribe my cores). Not going to look into it anymore, it's just my setup that makes it fail.
I think this is as ready as it's going to be. Of the three failing tests that remain, two aren't expected to pass and one only has tiny diffs (see updated PR description). I've also updated Trilinos to 13.0.0 released a couple weeks ago and it seems to be running fine.
Before getting this PR merged, I need someone to build the updated TPLs on LC (and preferably test the build of GEOSX PR as well) and push updated host-configs.