Trilinos icon indicating copy to clipboard operation
Trilinos copied to clipboard

MueLu: Compilation and runtime errors in OpenMP+MKL, Cuda 12, and HIP builds

Open maartenarnst opened this issue 1 year ago • 9 comments

Bug Report

@trilinos/muelu @jhux2 @cgcgcg

Description

We are compiling Trilinos with Muelu enabled in several builds:

  • OpenMP with MKL enabled
  • Cuda 12 without UVM with CuSparse enabled
  • Cuda 12 with UVM with CuSparse enabled
  • HIP

We are seeing several compilation errors and runtime errors.

In the OpenMP + MKL builds, in MKL related pieces of code in muelu/test/scaling/JacobiKernelDriver.cpp, muelu/test/scaling/MMKernelDriver.cpp, and muelu/test/scaling/TwoMatrixMMKernelDriver.cpp, several issues cause compilation and runtime errors:

  • There are uses of the type crs_matrix_type::local_matrix_type and calls to Tpetra functions such as getLocalMatrix that lead to compilation errors. It seems crs_matrix_type::local_matrix_device_type and getLocalMatrixDevice solves the issue, or crs_matrix_type::local_matrix_host_type and getLocalMatrixHost since it's host code.
  • Calls to fences like KCRS::execution_space::fence();cause compilation errors. It seems typename KCRS::execution_space().fence(); solves the issue.
  • In muelu/test/scaling/MMKernelDriver.cpp, in the definition of the macro MMKD_MKL_ERROR_CHECK, the use of nested rounded and curly braces causes a compilation error. It seems that removing the rounded brackets solves the issue.
  • In muelu/test/scaling/JacobiKernelDriver.cpp and muelu/test/scaling/TwoMatrixMMKernelDriver.cpp, a call of getLocalMatrix or updated as getLocalMatrixDevice or getLocalMatrixHost on an Xpetra matrix C causes runtime segmentation faults. It seems it's because C is empty. It seems using functions like setAllValues to set views in C solves the issue.
  • In muelu/test/scaling/TwoMatrixMMKernelDriver.cpp, the successive use of TimeMonitor causes runtime errors. It seems they can be solved either by using scopes or by stopping them by setting their RCP to null.

In the Cuda 12 builds,

  • Calls to CuSparse functions for SPMV in muelu/test/scaling/MatvecKernelDriver.cpp cause compilation errors. A flag CUSPARSE_MV_ALG_DEFAULT was renamed CUSPARSE_SPMV_ALG_DEFAULT in CuSparse recently.
  • In muelu/test/scaling/MatvecKernelDriver.cpp, uses of getLocalView<device_type> cause runtime errors related to WrappedDualView host/device counts. It seems using instead getLocalViewDevice with Tpetra Access Specifiers and moving the code a bit to a scope solves the issue.

In the Cuda 12 without UVM and HIP builds,

  • A missing _h in muelu/test/unit_tests/RegionMatrix.cpp results in invalid memory space host/device access. Adding the _h solves the issue.

The patch at the end of this issue collects all these changes. I would be happy to make a PR with it.

After the changes as in the patch, compilation succeeds without errors in all above mentioned builds.

However, even after these changes, runtime errors still occur in several muelu tests in the Cuda 12 without UVM and HIP builds:

  • In the Cuda 12 without UVM builds, runtime errors related to WrappedDualView host/device counts occur in MueLu_Structured_Region_Example_Elasticity3D_MPI_1, MueLu_Structured_Region_Example_Poisson3D_27pt_stencil_AMG_MPI_1 , MueLu_Structured_Region_Example_Poisson3D_27pt_stencil_AMG_MPI_4 , MueLu_UnitTestsRegion_MPI_1 , and MueLu_UnitTestsRegion_MPI_4 . They may be related to calls to functions like getDataNonConst in muelu/research/regionMG/src/SetupRegionHierarchy_def.hpp.
  • In the HIP build, these same runtime errors occur. And there is an additional runtime error related to ParameterListInterpreter_double_int_longlong_Kokkos_Compat_KokkosHIPWrapperNode_BlockCrs_UnitTest in MueLu_UnitTestsTpetra_MPI_1 . It refers to /home/costmo-user/Trilinos/packages/ifpack2/src/Ifpack2_Relaxation_def.hpp:977 and says that GETF2 or GETRI failed on = 1 diagonal blocks.
From 9719e97fb9462a1510ff706cee3d384278051619 Mon Sep 17 00:00:00 2001
From: Maarten Arnst <[email protected]>
Date: Wed, 8 Mar 2023 09:53:06 +0100
Subject: [PATCH] Fixes for MKL and cuda 12.

---
 .../muelu/test/scaling/JacobiKernelDriver.cpp | 40 ++++++------
 .../muelu/test/scaling/MMKernelDriver.cpp     | 18 +++---
 .../muelu/test/scaling/MatvecKernelDriver.cpp | 18 ++++--
 .../test/scaling/TwoMatrixMMKernelDriver.cpp  | 64 +++++++++++--------
 .../muelu/test/unit_tests/RegionMatrix.cpp    |  4 +-
 5 files changed, 82 insertions(+), 62 deletions(-)

diff --git a/packages/muelu/test/scaling/JacobiKernelDriver.cpp b/packages/muelu/test/scaling/JacobiKernelDriver.cpp
index 154ee5f2750..340bf80da49 100644
--- a/packages/muelu/test/scaling/JacobiKernelDriver.cpp
+++ b/packages/muelu/test/scaling/JacobiKernelDriver.cpp
@@ -107,13 +107,13 @@ void Jacobi_MKL_SPMM(const Xpetra::Matrix<Scalar,LocalOrdinal,GlobalOrdinal,Node
 #ifdef HAVE_MUELU_TPETRA
     typedef Tpetra::CrsMatrix<Scalar,LocalOrdinal,GlobalOrdinal,Node> crs_matrix_type;
     typedef Tpetra::Vector<Scalar,LocalOrdinal,GlobalOrdinal,Node>    vector_type;
-    typedef typename crs_matrix_type::local_matrix_type    KCRS;
-    typedef typename KCRS::StaticCrsGraphType              graph_t;
-    typedef typename graph_t::row_map_type::non_const_type lno_view_t;
-    typedef typename graph_t::row_map_type::const_type     c_lno_view_t;
-    typedef typename graph_t::entries_type::non_const_type lno_nnz_view_t;
-    typedef typename graph_t::entries_type::const_type     c_lno_nnz_view_t;
-    typedef typename KCRS::values_type::non_const_type     scalar_view_t;
+    typedef typename crs_matrix_type::local_matrix_device_type KCRS;
+    typedef typename KCRS::StaticCrsGraphType                  graph_t;
+    typedef typename graph_t::row_map_type::non_const_type     lno_view_t;
+    typedef typename graph_t::row_map_type::const_type         c_lno_view_t;
+    typedef typename graph_t::entries_type::non_const_type     lno_nnz_view_t;
+    typedef typename graph_t::entries_type::const_type         c_lno_nnz_view_t;
+    typedef typename KCRS::values_type::non_const_type         scalar_view_t;
 
     typedef typename vector_type::device_type              device_type;
     typedef typename Kokkos::View<MKL_INT*,typename lno_nnz_view_t::array_layout,typename lno_nnz_view_t::device_type> mkl_int_type;
@@ -121,21 +121,22 @@ void Jacobi_MKL_SPMM(const Xpetra::Matrix<Scalar,LocalOrdinal,GlobalOrdinal,Node
     RCP<const crs_matrix_type> Au = Utilities::Op2TpetraCrs(rcp(&A,false));
     RCP<const crs_matrix_type> Bu = Utilities::Op2TpetraCrs(rcp(&B,false));
     RCP<const crs_matrix_type> Cu = Utilities::Op2TpetraCrs(rcp(&C,false));
+    RCP<crs_matrix_type> Cnc = Teuchos::rcp_const_cast<crs_matrix_type>(Cu);
     RCP<const vector_type> Du = Xpetra::toTpetra(D);
 
-    const KCRS & Amat = Au->getLocalMatrix();
-    const KCRS & Bmat = Bu->getLocalMatrix();
-    KCRS Cmat = Cu->getLocalMatrix();
+    const KCRS & Amat = Au->getLocalMatrixDevice();
+    const KCRS & Bmat = Bu->getLocalMatrixDevice();
+
     if(A.getLocalNumRows()!=C.getLocalNumRows())  throw std::runtime_error("C is not sized correctly");
 
     c_lno_view_t Arowptr = Amat.graph.row_map, Browptr = Bmat.graph.row_map;
     lno_view_t Crowptr("Crowptr",C.getLocalNumRows()+1);
     c_lno_nnz_view_t Acolind = Amat.graph.entries, Bcolind = Bmat.graph.entries;
-    lno_nnz_view_t Ccolind = Cmat.graph.entries;
+    lno_nnz_view_t Ccolind;
     const scalar_view_t Avals = Amat.values, Bvals = Bmat.values;
-    scalar_view_t Cvals = Cmat.values;
+    scalar_view_t Cvals;
 
-    auto Dvals = Du->template getLocalView<device_type>();
+    auto Dvals = Du->getLocalViewDevice(Tpetra::Access::ReadOnly);
 
     sparse_matrix_t AMKL;
     sparse_matrix_t BMKL;
@@ -194,21 +195,21 @@ void Jacobi_MKL_SPMM(const Xpetra::Matrix<Scalar,LocalOrdinal,GlobalOrdinal,Node
     // Multiply (A*B)
     tm = rcp(new TimeMonitor(*TimeMonitor::getNewTimer("Jacobi MKL: Multiply")));
     result = mkl_sparse_spmm(SPARSE_OPERATION_NON_TRANSPOSE, AMKL, BMKL, &XTempMKL);
-    KCRS::execution_space::fence();
+    typename KCRS::execution_space().fence();
     if(result != SPARSE_STATUS_SUCCESS) throw std::runtime_error("MKL Multiply failed");
 
     // **********************************
     // Scale (-omegaD) * AB)
     tm = rcp(new TimeMonitor(*TimeMonitor::getNewTimer("Jacobi MKL: Scale-Via-Multiply")));
     result = mkl_sparse_spmm(SPARSE_OPERATION_NON_TRANSPOSE, DMKL, XTempMKL, &YTempMKL);
-    KCRS::execution_space::fence();
+    typename KCRS::execution_space().fence();
     if(result != SPARSE_STATUS_SUCCESS) throw std::runtime_error("MKL Scale failed");
 
     // **********************************
     // Add B - ((-omegaD) * AB))
     tm = rcp(new TimeMonitor(*TimeMonitor::getNewTimer("Jacobi MKL: Add")));
     result = mkl_sparse_d_add(SPARSE_OPERATION_NON_TRANSPOSE,BMKL,1.0,YTempMKL,&CMKL);
-    KCRS::execution_space::fence();
+    typename KCRS::execution_space().fence();
     if(result != SPARSE_STATUS_SUCCESS) throw std::runtime_error("MKL Add failed");
 
     // **********************************
@@ -226,9 +227,10 @@ void Jacobi_MKL_SPMM(const Xpetra::Matrix<Scalar,LocalOrdinal,GlobalOrdinal,Node
     copy_view_n(cnnz,columns,Ccolind);
     copy_view_n(cnnz,values,Cvals);
 
-    Cmat.graph.row_map = Crowptr;
-    Cmat.graph.entries = Ccolind;
-    Cmat.values = Cvals;
+    Cnc->replaceColMap(Bu->getColMap());
+    Cnc->setAllValues(Crowptr,
+                      Ccolind,
+                      Cvals);
 
     mkl_sparse_destroy(AMKL);
     mkl_sparse_destroy(BMKL);
diff --git a/packages/muelu/test/scaling/MMKernelDriver.cpp b/packages/muelu/test/scaling/MMKernelDriver.cpp
index c304a03a979..be13f478010 100644
--- a/packages/muelu/test/scaling/MMKernelDriver.cpp
+++ b/packages/muelu/test/scaling/MMKernelDriver.cpp
@@ -382,7 +382,7 @@ void Multiply_ViennaCL(const Xpetra::Matrix<Scalar,LocalOrdinal,GlobalOrdinal,No
 #include "mkl_types.h"                                  // for MKL_INT
 //#include "mkl.h"
 
-#define MMKD_MKL_ERROR_CHECK(rc) ({ \
+#define MMKD_MKL_ERROR_CHECK(rc) { \
 if (mkl_rc != SPARSE_STATUS_SUCCESS) { \
   std::stringstream ss;  \
   switch (mkl_rc) { \
@@ -410,7 +410,7 @@ if (mkl_rc != SPARSE_STATUS_SUCCESS) { \
   std::cerr << ss.str () << std::endl; \
   return; \
 } \
-})
+}
 
   // mkl_sparse_spmm
 template<class Scalar, class LocalOrdinal, class GlobalOrdinal, class Node>
@@ -425,13 +425,13 @@ void Multiply_MKL_SPMM(const Xpetra::Matrix<Scalar,LocalOrdinal,GlobalOrdinal,No
   RCP<TimeMonitor> tm;
 #if defined(HAVE_MUELU_TPETRA)
     typedef Tpetra::CrsMatrix<Scalar,LocalOrdinal,GlobalOrdinal,Node> crs_matrix_type;
-    typedef typename crs_matrix_type::local_matrix_type    KCRS;
-    typedef typename KCRS::StaticCrsGraphType              graph_t;
-    typedef typename graph_t::row_map_type::non_const_type lno_view_t;
-    typedef typename graph_t::row_map_type::const_type     c_lno_view_t;
-    typedef typename graph_t::entries_type::non_const_type lno_nnz_view_t;
-    typedef typename graph_t::entries_type::const_type     c_lno_nnz_view_t;
-    typedef typename KCRS::values_type::non_const_type     scalar_view_t;
+    typedef typename crs_matrix_type::local_matrix_device_type KCRS;
+    typedef typename KCRS::StaticCrsGraphType                  graph_t;
+    typedef typename graph_t::row_map_type::non_const_type     lno_view_t;
+    typedef typename graph_t::row_map_type::const_type         c_lno_view_t;
+    typedef typename graph_t::entries_type::non_const_type     lno_nnz_view_t;
+    typedef typename graph_t::entries_type::const_type         c_lno_nnz_view_t;
+    typedef typename KCRS::values_type::non_const_type         scalar_view_t;
 
     typedef typename Kokkos::View<MKL_INT*,typename lno_nnz_view_t::array_layout,typename lno_nnz_view_t::device_type> mkl_int_type;
 
diff --git a/packages/muelu/test/scaling/MatvecKernelDriver.cpp b/packages/muelu/test/scaling/MatvecKernelDriver.cpp
index 060ab61189f..6428614d194 100644
--- a/packages/muelu/test/scaling/MatvecKernelDriver.cpp
+++ b/packages/muelu/test/scaling/MatvecKernelDriver.cpp
@@ -631,6 +631,12 @@ public:
   cusparseStatus_t spmv(const Scalar alpha, const Scalar beta) {
     // compute: y = alpha*Ax + beta*y
 
+#if CUSPARSE_VERSION >= 11201
+    cusparseSpMVAlg_t alg = CUSPARSE_SPMV_ALG_DEFAULT;
+#else
+    cusparseSpMVAlg_t alg = CUSPARSE_MV_ALG_DEFAULT;
+#endif
+
     size_t bufferSize;
     CHECK_CUSPARSE(cusparseSpMV_bufferSize(cusparseHandle,
                                            transA,
@@ -640,7 +646,7 @@ public:
                                            &beta,
                                            vecY,
                                            CUDA_R_64F,
-                                           CUSPARSE_MV_ALG_DEFAULT,
+                                           alg,
                                            &bufferSize));
 
     void* dBuffer = NULL;
@@ -654,7 +660,7 @@ public:
                                        &beta,
                                        vecY,
                                        CUDA_R_64F,
-                                       CUSPARSE_MV_ALG_DEFAULT,
+                                       alg,
                                        dBuffer);
 
     CHECK_CUDA(cudaFree(dBuffer));
@@ -1055,10 +1061,6 @@ int main_(Teuchos::CommandLineProcessor &clp, Xpetra::UnderlyingLib& lib, int ar
                               ArowptrMKL.data()+1,
                               AcolindMKL.data(),
                               (double*)Avals.data());
-      auto X_lcl = xt.template getLocalView<device_type> ();
-      auto Y_lcl = yt.template getLocalView<device_type> ();
-      mkl_xdouble = (double*)X_lcl.data();
-      mkl_ydouble = (double*)Y_lcl.data();
     }
     else
       throw std::runtime_error("MKL Type Mismatch");
@@ -1121,6 +1123,10 @@ int main_(Teuchos::CommandLineProcessor &clp, Xpetra::UnderlyingLib& lib, int ar
         case Experiments::MKL:
         {
             TimeMonitor t(*TimeMonitor::getNewTimer("MV MKL: Total"));
+            auto X_lcl = xt.getLocalViewDevice(Tpetra::Access::ReadOnly);
+            auto Y_lcl = yt.getLocalViewDevice(Tpetra::Access::OverwriteAll);
+            mkl_xdouble = (double*)X_lcl.data();
+            mkl_ydouble = (double*)Y_lcl.data();
             MV_MKL(mkl_A,mkl_xdouble,mkl_ydouble);
         }
           break;
diff --git a/packages/muelu/test/scaling/TwoMatrixMMKernelDriver.cpp b/packages/muelu/test/scaling/TwoMatrixMMKernelDriver.cpp
index e4ff9877cae..9913063ba84 100644
--- a/packages/muelu/test/scaling/TwoMatrixMMKernelDriver.cpp
+++ b/packages/muelu/test/scaling/TwoMatrixMMKernelDriver.cpp
@@ -139,13 +139,13 @@ void MM2_MKL(const Xpetra::Matrix<Scalar,LocalOrdinal,GlobalOrdinal,Node> &A, co
   RCP<TimeMonitor> tm;
 #ifdef HAVE_MUELU_TPETRA
     typedef Tpetra::CrsMatrix<Scalar,LocalOrdinal,GlobalOrdinal,Node> crs_matrix_type;
-    typedef typename crs_matrix_type::local_matrix_type    KCRS;
-    typedef typename KCRS::StaticCrsGraphType              graph_t;
-    typedef typename graph_t::row_map_type::non_const_type lno_view_t;
-    typedef typename graph_t::row_map_type::const_type     c_lno_view_t;
-    typedef typename graph_t::entries_type::non_const_type lno_nnz_view_t;
-    typedef typename graph_t::entries_type::const_type     c_lno_nnz_view_t;
-    typedef typename KCRS::values_type::non_const_type     scalar_view_t;
+    typedef typename crs_matrix_type::local_matrix_device_type KCRS;
+    typedef typename KCRS::StaticCrsGraphType                  graph_t;
+    typedef typename graph_t::row_map_type::non_const_type     lno_view_t;
+    typedef typename graph_t::row_map_type::const_type         c_lno_view_t;
+    typedef typename graph_t::entries_type::non_const_type     lno_nnz_view_t;
+    typedef typename graph_t::entries_type::const_type         c_lno_nnz_view_t;
+    typedef typename KCRS::values_type::non_const_type         scalar_view_t;
     typedef Tpetra::Map<LO,GO,NO>                                     map_type;
     typedef typename map_type::local_map_type                         local_map_type;
 
@@ -156,19 +156,20 @@ void MM2_MKL(const Xpetra::Matrix<Scalar,LocalOrdinal,GlobalOrdinal,Node> &A, co
     RCP<const crs_matrix_type> B1u = Utilities::Op2TpetraCrs(rcp(&B1,false));
     RCP<const crs_matrix_type> B2u = Utilities::Op2TpetraCrs(rcp(&B2,false));
     RCP<const crs_matrix_type> Cu = Utilities::Op2TpetraCrs(rcp(&C,false));
+    RCP<crs_matrix_type> Cnc = Teuchos::rcp_const_cast<crs_matrix_type>(Cu);
+
+    const KCRS & Amat = Au->getLocalMatrixDevice();
+    const KCRS & B1mat = B1u->getLocalMatrixDevice();
+    const KCRS & B2mat = B2u->getLocalMatrixDevice();
 
-    const KCRS & Amat = Au->getLocalMatrix();
-    const KCRS & B1mat = B1u->getLocalMatrix();
-    const KCRS & B2mat = B2u->getLocalMatrix();
-    KCRS Cmat = Cu->getLocalMatrix();
     if(A.getLocalNumRows()!=C.getLocalNumRows())  throw std::runtime_error("C is not sized correctly");
 
     c_lno_view_t Arowptr = Amat.graph.row_map, B1rowptr = B1mat.graph.row_map, B2rowptr = B2mat.graph.row_map;
     lno_view_t Crowptr("Crowptr",C.getLocalNumRows()+1);
     c_lno_nnz_view_t Acolind = Amat.graph.entries, B1colind = B1mat.graph.entries, B2colind = B2mat.graph.entries;
-    lno_nnz_view_t Ccolind = Cmat.graph.entries;
+    lno_nnz_view_t Ccolind;
     const scalar_view_t Avals = Amat.values, B1vals = B1mat.values, B2vals = B2mat.values;
-    scalar_view_t Cvals = Cmat.values;
+    scalar_view_t Cvals;
     RCP<const Tpetra::Map<LO,GO,Node> > Ccolmap_t = Xpetra::toTpetra(Ccolmap);
     local_map_type Bcolmap_local = B1u->getColMap()->getLocalMap();
     local_map_type Icolmap_local = B2u->getColMap()->getLocalMap();
@@ -233,39 +234,49 @@ void MM2_MKL(const Xpetra::Matrix<Scalar,LocalOrdinal,GlobalOrdinal,Node> &A, co
     if(algorithm_name == "MULT_ADD" ) {
       // **********************************
       // Multiply #1 (A*B1)
-      tm = rcp(new TimeMonitor(*TimeMonitor::getNewTimer("MM2 MKL MULT_ADD: Multiply 1")));
+      {
+      TimeMonitor tm1(*TimeMonitor::getNewTimer("MM2 MKL MULT_ADD: Multiply 1"));
       result = mkl_sparse_spmm(SPARSE_OPERATION_NON_TRANSPOSE, AMKL, B1MKL, &Temp1MKL);
-      KCRS::execution_space::fence();
+      typename KCRS::execution_space().fence();
       if(result != SPARSE_STATUS_SUCCESS) throw std::runtime_error("MKL Multiply 1 failed: "+mkl_error(result));
+      }
 
       // **********************************
       // Multiply #2 (A*B2)
-      tm = rcp(new TimeMonitor(*TimeMonitor::getNewTimer("MM2 MKL MULT_ADD: Multiply 2")));
+      {
+      TimeMonitor tm2(*TimeMonitor::getNewTimer("MM2 MKL MULT_ADD: Multiply 2"));
       result = mkl_sparse_spmm(SPARSE_OPERATION_NON_TRANSPOSE, AMKL, B1MKL, &Temp2MKL);
-      KCRS::execution_space::fence();
+      typename KCRS::execution_space().fence();
       if(result != SPARSE_STATUS_SUCCESS) throw std::runtime_error("MKL Multiply 2 failed: "+mkl_error(result));
+      }
 
       // **********************************
       // Add (A*B1) + (A*B2)
-      tm = rcp(new TimeMonitor(*TimeMonitor::getNewTimer("MM2 MKL MULT_ADD: Add")));
+      {
+      TimeMonitor tm3(*TimeMonitor::getNewTimer("MM2 MKL MULT_ADD: Add"));
       result = mkl_sparse_d_add(SPARSE_OPERATION_NON_TRANSPOSE,Temp1MKL,1.0,Temp2MKL,&CMKL);
-      KCRS::execution_space::fence();
+      typename KCRS::execution_space().fence();
       if(result != SPARSE_STATUS_SUCCESS) throw std::runtime_error("MKL Add failed: "+mkl_error(result));
+      }
     }
     else if(algorithm_name == "ADD_MULT" ) {
       // **********************************
       // Add B1 + B2
-      tm = rcp(new TimeMonitor(*TimeMonitor::getNewTimer("MM2 MKL ADD_MULT: Add")));
+      {
+      TimeMonitor tm4(*TimeMonitor::getNewTimer("MM2 MKL ADD_MULT: Add"));
       result = mkl_sparse_d_add(SPARSE_OPERATION_NON_TRANSPOSE,B1MKL,1.0,B2MKL,&Temp1MKL);
-      KCRS::execution_space::fence();
+      typename KCRS::execution_space().fence();
       if(result != SPARSE_STATUS_SUCCESS) throw std::runtime_error("MKL Add failed: "+mkl_error(result));
+      }
 
       // **********************************
       // Multiply A*(B1+B2)
-      tm = rcp(new TimeMonitor(*TimeMonitor::getNewTimer("MM2 MKL ADD_MULT: Multiply")));
+      {
+      TimeMonitor tm5(*TimeMonitor::getNewTimer("MM2 MKL ADD_MULT: Multiply"));
       result = mkl_sparse_spmm(SPARSE_OPERATION_NON_TRANSPOSE, AMKL, Temp1MKL, &CMKL);
-      KCRS::execution_space::fence();
+      typename KCRS::execution_space().fence();
       if(result != SPARSE_STATUS_SUCCESS) throw std::runtime_error("MKL Multiply failed: "+mkl_error(result));
+      }
     }
     else
       throw std::runtime_error("Invalid MKL algorithm");
@@ -285,9 +296,10 @@ void MM2_MKL(const Xpetra::Matrix<Scalar,LocalOrdinal,GlobalOrdinal,Node> &A, co
     copy_view_n(cnnz,columns,Ccolind);
     copy_view_n(cnnz,values,Cvals);
 
-    Cmat.graph.row_map = Crowptr;
-    Cmat.graph.entries = Ccolind;
-    Cmat.values = Cvals;
+    Cnc->replaceColMap(Ccolmap_t);
+    Cnc->setAllValues(Crowptr,
+                      Ccolind,
+                      Cvals);
 
     mkl_sparse_destroy(AMKL);
     mkl_sparse_destroy(B1MKL);
diff --git a/packages/muelu/test/unit_tests/RegionMatrix.cpp b/packages/muelu/test/unit_tests/RegionMatrix.cpp
index 7487702fa3d..39a30961995 100644
--- a/packages/muelu/test/unit_tests/RegionMatrix.cpp
+++ b/packages/muelu/test/unit_tests/RegionMatrix.cpp
@@ -637,10 +637,10 @@ TEUCHOS_UNIT_TEST_TEMPLATE_4_DECL(RegionMatrix, RegionToCompositeMatrix, Scalar,
   typename values_type::HostMirror compositeValues_h = Kokkos::create_mirror_view(compositeValues);
   Kokkos::deep_copy(compositeValues_h, compositeValues);
 
-  TEST_EQUALITY(compositeEntries_h.extent(0), refEntries.extent(0));
+  TEST_EQUALITY(compositeEntries_h.extent(0), refEntries_h.extent(0));
   TEST_EQUALITY(compositeValues_h.extent(0),  refValues_h.extent(0));
   for(LO idx = 0; idx < compositeEntries_h.extent_int(0); ++idx) {
-    TEST_EQUALITY(compositeEntries_h(idx), refEntries(idx));
+    TEST_EQUALITY(compositeEntries_h(idx), refEntries_h(idx));
     TEST_FLOATING_EQUALITY(TST::magnitude(compositeValues_h(idx)),
                            TST::magnitude(refValues_h(idx)),
                            100*TMT::eps());
-- 
2.37.1 (Apple Git-137.1)

maartenarnst avatar Mar 09 '23 16:03 maartenarnst

Automatic mention of the @trilinos/muelu team

github-actions[bot] avatar Mar 09 '23 16:03 github-actions[bot]

Automatic mention of the @trilinos/muelu team

github-actions[bot] avatar Mar 09 '23 16:03 github-actions[bot]

Thanks for this thorough write-up. Yes, please open a PR. I don't think we make any promises that experimental code (MueLu_ENABLE_Experimental=ON) will build and pass on all platforms.

cgcgcg avatar Mar 09 '23 16:03 cgcgcg

Automatic mention of the @trilinos/muelu team

github-actions[bot] avatar Mar 09 '23 16:03 github-actions[bot]

Thanks, @cgcgcg. I've just made the PR. It's PR #11647 .

maartenarnst avatar Mar 09 '23 18:03 maartenarnst

Hi @cgcgcg.

Thanks for approving the PR.

On our side, we disabled the experimental code (following your comment above). After this change, and with the fixes in the PR, muelu and all tests compile. However, we do see a remaining runtime error in one of the tests. It seems to occur randomly. Sometimes, it occurs, sometimes, it doesn't. It's in MueLu_UnitTestsTpetra_MPI_1, and it's always one of the of the ParameterListInterpreter_...BlockCrs_UnitTest tests. The type varies from one run to another.

The error message looks like

p=0: *** Caught standard std::exception of type 'std::runtime_error' :
/home/costmo-user/Trilinos/packages/ifpack2/src/Ifpack2_Relaxation_def.hpp:977:
Throw number = 379
Throw test that evaluated to true: info > 0
GETF2 or GETRI failed on = 1 diagonal blocks.
[FAILED]  (0.0528 sec) ParameterListInterpreter_double_int_longlong_Kokkos_Compat_KokkosHIPWrapperNode_BlockCrs_UnitTest

Do you have an idea about how to solve this issue? Thanks in advance!

maartenarnst avatar Mar 13 '23 13:03 maartenarnst

I think this is #11209.

cgcgcg avatar Mar 13 '23 14:03 cgcgcg

OK, thanks. That seems to be the same issue indeed.

maartenarnst avatar Mar 13 '23 14:03 maartenarnst

This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity. If you would like to keep this issue open please add a comment and/or remove the MARKED_FOR_CLOSURE label. If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE. If it is ok for this issue to be closed, feel free to go ahead and close it. Please do not add any comments or change any labels or otherwise touch this issue unless your intention is to reset the inactivity counter for an additional year.

github-actions[bot] avatar Mar 13 '24 12:03 github-actions[bot]

@maartenarnst Just double checking: we can close this one, right?

cgcgcg avatar Apr 03 '24 20:04 cgcgcg

Ok to close indeed. Thanks for checking.

maartenarnst avatar Apr 03 '24 20:04 maartenarnst

Perfect!

cgcgcg avatar Apr 03 '24 20:04 cgcgcg