GEOS icon indicating copy to clipboard operation
GEOS copied to clipboard

Tpetra LAI

Open klevzoff opened this issue 4 years ago • 5 comments

This PR adds support for Tpetra-based LAI. It provides similar functionality to other existing LA interfaces. It is not enabled by default, meaning the Epetra-based Trilinos interface is still default. The plan is to hopefully get it used and tested and eventually replace the old one.

It uses OpenMP and/or GPU acceleration that is available (via Kokkos) in Trilinos packages:

  • Tpetra (matrices/vectors)
  • Belos (Krylov solvers)
  • Ifpack2 (smoothers)
  • MueLu (multigrid)

There is still a host-based matrix copy involved - the issue of copy avoiding is a lot more complex and is not addressed here. For vectors, the copy from LvArray::ArrayView occurs on device, however there is another host copy that gets allocated to support Tpetra's dual-view semantics. These cannot currently be eliminated due to a fundamental issue - Tpetra requires all its device allocations to be in Unified Memory (most solvers, in particular Krylov and preconditioners, seem to work fine when given non-UM vector views, but direct solver KLU2 fails). Some issues in their tracker dating back 4 years suggest relaxing that requirement may not be easy for them any time soon.

More specifically:

  • new classes TpetraVector, TpetraMatrix, TrilinosTpetraPreconditioner, TrilinosTpetraSolver and TrilinosTpetraInterface, with identical capabilities to exising Trilinos/Hypre/Petsc classes
  • added a "direct" option for preconditionerType input - it was mainly useful to troubleshoot BlockPreconditioner (eliminating the possibility of sub-block solvers underperforming) and I decided to leave it in
  • improved/refactored unit test for vector classes, with a proper test for each of the main API calls

The new interface:

  • [x] passes all unit tests, without and with CUDA (though we don't run the latter on Travis)
  • [x] passes almost all integrated tests without CUDA, with a few exceptions listed below
  • [x] passes almost all single-rank integrated tests with CUDA (unfortunately, I can't run multi-rank tests with CUDA reliably)

Integrated tests that fail:

Test Status Reason
Hydrofracture_KGD_NodeBased_C3D6 (all) FAIL RUN Only works with Epetra
KGD_ZeroToughness (all) FAIL_RUN Only works with Epetra
KGD_ZeroViscosity (all) FAIL_RUN Only works with Epetra
Sneddon (CUDA only) FAIL RUN breakdown in numerical factorization in KLU2, probably related to https://github.com/GEOSX/GEOSX/issues/1083
beamBending (CUDA only) FAIL_RUN Kokkos::Impl::ParallelReduce< Cuda > requested too much L0 scratch memory during MueLu prolongation construction, new (did not fail earlier, including post-13.0 Trilinos update)

I also need someone to test the PR (including TPL build) on Lassen for possible build issues, configuring with -D GEOSX_LA_INTERFACE=TrilinosTpetra. Other than that, this is ready to merge, rebaseline not required.

Related to: https://github.com/GEOSX/thirdPartyLibs/pull/120 https://github.com/GEOSX/LvArray/pull/189 https://github.com/GEOSX/GEOSX_PTP/pull/20 https://github.com/GEOSX/integratedTests/pull/84

klevzoff avatar Jul 30 '20 07:07 klevzoff

On my Ubuntu machine, I am able to run both the two ContactMechanics_SimpleCubes_0[8,9] integratedTests and they pass their checks. Do you think is it possible to have more details on the error that you see?

BTW, great PR @klevzoff! Thanks!

andrea-franceschini avatar Aug 10 '20 16:08 andrea-franceschini

Do you think is it possible to have more details on the error that you see?

Are you sure you compiled with GEOSX_LA_INTERFACE=TrilinosTpetra? Anyway, here's what I get after a few timesteps already done:

Received signal 6: Aborted

** StackTrace of 24 frames **
Frame 1: LvArray::system::stackTraceHandler(int, bool)
Frame 2: 
Frame 3: gsignal
Frame 4: abort
Frame 5: 
Frame 6: 
Frame 7: 
Frame 8: 
Frame 9: _ZN6Tpetra11Distributor7doPostsIN6Kokkos4ViewIPKdJNS2_11LayoutRightENS2_6DeviceINS2_6OpenMPENS2_9HostSpaceEEENS2_12MemoryTraitsILj0EEEEEENS3_IPdJS6_SA_EEEEENSt9enable_ifIXaasr6Kokkos4Impl7is_viewIT_EE5valuesr6Kokkos4Impl7is_viewIT0_EE5valueEvE4typeERKSH_mRKSI_
Frame 10: _ZN6Tpetra11Distributor15doPostsAndWaitsIN6Kokkos4ViewIPKdJNS2_11LayoutRightENS2_6DeviceINS2_6OpenMPENS2_9HostSpaceEEENS2_12MemoryTraitsILj0EEEEEENS3_IPdJS6_SA_EEEEENSt9enable_ifIXaasr6Kokkos4Impl7is_viewIT_EE5valuesr6Kokkos4Impl7is_viewIT0_EE5valueEvE4typeERKSH_mRKSI_
Frame 11: Tpetra::DistObject<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::doTransferNew(Tpetra::SrcDistObject const&, Tpetra::CombineMode, unsigned long, Kokkos::DualView<int const*, Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, void, void> const&, Kokkos::DualView<int const*, Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, void, void> const&, Kokkos::DualView<int const*, Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, void, void> const&, Kokkos::DualView<int const*, Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, void, void> const&, Tpetra::Distributor&, Tpetra::DistObject<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::ReverseOption, bool, bool)
Frame 12: Tpetra::DistObject<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::doTransfer(Tpetra::SrcDistObject const&, Tpetra::Details::Transfer<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const&, char const*, Tpetra::DistObject<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::ReverseOption, Tpetra::CombineMode, bool)
Frame 13: Tpetra::DistObject<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::doImport(Tpetra::SrcDistObject const&, Tpetra::Import<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const&, Tpetra::CombineMode, bool)
Frame 14: Amesos2::MultiVecAdapter<Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > >::put1dData(Teuchos::ArrayView<double const> const&, unsigned long, Teuchos::Ptr<Tpetra::Map<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const>, Amesos2::EDistribution)
Frame 15: Amesos2::Util::put_1d_data_helper<Amesos2::MultiVecAdapter<Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > >, double>::do_put(Teuchos::Ptr<Amesos2::MultiVecAdapter<Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > > > const&, Teuchos::ArrayView<double> const&, unsigned long, Amesos2::EDistribution, long long)
Frame 16: Amesos2::KLU2<Tpetra::CrsMatrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >, Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > >::solve_impl(Teuchos::Ptr<Amesos2::MultiVecAdapter<Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > > >, Teuchos::Ptr<Amesos2::MultiVecAdapter<Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > > const>) const
Frame 17: Amesos2::SolverCore<Amesos2::KLU2, Tpetra::CrsMatrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >, Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > >::solve(Teuchos::Ptr<Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > >, Teuchos::Ptr<Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const>) const
Frame 18: Amesos2::SolverCore<Amesos2::KLU2, Tpetra::CrsMatrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >, Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > >::solve()
Frame 19: geosx::TrilinosTpetraSolver::solve_direct(geosx::TpetraMatrix&, geosx::TpetraVector&, geosx::TpetraVector&)
Frame 20: geosx::TrilinosTpetraSolver::solve(geosx::TpetraMatrix&, geosx::TpetraVector&, geosx::TpetraVector&, geosx::DofManager const*)
Frame 21: geosx::SolverBase::SolveSystem(geosx::DofManager const&, geosx::TpetraMatrix&, geosx::TpetraVector&, geosx::TpetraVector&)
Frame 22: geosx::LagrangianContactSolver::SolveSystem(geosx::DofManager const&, geosx::TpetraMatrix&, geosx::TpetraVector&, geosx::TpetraVector&)
Frame 23: geosx::LagrangianContactSolver::NonlinearImplicitStep(double const&, double const&, int, geosx::DomainPartition&)
Frame 24: geosx::LagrangianContactSolver::SolverStep(double const&, double const&, int, geosx::DomainPartition&)

And the full log: ContactMechanics_SimpleCubes_08.data.txt

I will run it in debug later when I get a chance and hopefully find out what that error actually means.

klevzoff avatar Aug 10 '20 17:08 klevzoff

Yes, I did a clean build with -DGEOSX_LA_INTERFACE=TrilinosTpetra.

andrea-franceschini avatar Aug 10 '20 17:08 andrea-franceschini

On my Ubuntu machine, I am able to run both the two ContactMechanics_SimpleCubes_0[8,9] integratedTests and they pass their checks.

Turns out they pass for me too if I run each test separately, but fail when I run the whole test suite (and oversubscribe my cores). Not going to look into it anymore, it's just my setup that makes it fail.

klevzoff avatar Aug 19 '20 19:08 klevzoff

I think this is as ready as it's going to be. Of the three failing tests that remain, two aren't expected to pass and one only has tiny diffs (see updated PR description). I've also updated Trilinos to 13.0.0 released a couple weeks ago and it seems to be running fine.

Before getting this PR merged, I need someone to build the updated TPLs on LC (and preferably test the build of GEOSX PR as well) and push updated host-configs.

klevzoff avatar Aug 21 '20 01:08 klevzoff