Trilinos
Trilinos copied to clipboard
Tpetra: compilation errors in build with MKL enabled
@trilinos/tpetra @trilinos/kokkos-kernels
Description
What went wrong?
We are seeing compilation errors in a Trilinos build with MKL enabled. The compilation errors seem to follow from changes that were introduced in the recent kokkos snapshot in PR #10962.
The error messages look like
In file included from ~/Trilinos/packages/kokkos-kernels/src/sparse/impl/KokkosSparse_spgemm_mkl_impl.hpp:49,
from ~/Trilinos/packages/kokkos-kernels/src/sparse/impl/KokkosSparse_spgemm_numeric_spec.hpp:59,
from ~/Trilinos/packages/kokkos-kernels/src/sparse/KokkosSparse_spgemm_numeric.hpp:48,
from ~/home/costmo-user/Trilinos/packages/kokkos-kernels/src/sparse/KokkosSparse_spgemm.hpp:47,
from ~/Trilinos/packages/tpetra/core/ext/TpetraExt_MatrixMatrix_def.hpp:66,
from ~/Trilinos/packages/tpetra/core/ext/TpetraExt_MatrixMatrix.cpp:49:
~/Trilinos/packages/kokkos-kernels/src/sparse/KokkosSparse_Utils_mkl.hpp:
In instantiation of 'class KokkosSparse::Impl::MKLSparseMatrix<long long int>':
~/Trilinos/packages/kokkos-kernels/src/sparse/KokkosSparse_Utils_mkl.hpp:126:58: error: static assertion failed: Scalar type used in MKLSparseMatrix<value_type> is NOT supported by MKL
126 | static_assert(mkl_is_supported_value_type<value_type>::value,
Do you have an idea what might fix things?
It seems Tpetra wants to instantiate several classes using int
and long long int
as scalar type. This appears to cause problems here in sparse GEMM calls when MKL is enabled. It seems that kokkos-kernels is deciding whether or not to direct sparse GEMM calls to MKL by using runtime conditionals in src/sparse/impl/KokkosSparse_spgemm_numeric_spec.hpp
. But, PR #10962 changed how function calls are directed to MKL and introduced a compile-time assert to check that the scalar type is supported by MKL (which int
and long long int
, seemingly, are not).
One way of solving this issue might be to replace the compile-time assert with runtime conditionals. Changing KokkosSparse_Utils_mkl.hpp
as in the following patch solved the compilation errors in our build:
From d79bd6181a5449d4f558276165af4766f671349c Mon Sep 17 00:00:00 2001
From: Maarten Arnst <[email protected]>
Date: Sat, 17 Sep 2022 11:13:34 +0200
Subject: [PATCH] Change compile-time assert into run-time check in
MKLSparseMatrix.
---
.../src/sparse/KokkosSparse_Utils_mkl.hpp | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/packages/kokkos-kernels/src/sparse/KokkosSparse_Utils_mkl.hpp b/packages/kokkos-kernels/src/sparse/KokkosSparse_Utils_mkl.hpp
index b9eb3a9bd25..4abeeca88c1 100644
--- a/packages/kokkos-kernels/src/sparse/KokkosSparse_Utils_mkl.hpp
+++ b/packages/kokkos-kernels/src/sparse/KokkosSparse_Utils_mkl.hpp
@@ -123,16 +123,16 @@ template <typename value_type>
class MKLSparseMatrix {
sparse_matrix_t mtx;
- static_assert(mkl_is_supported_value_type<value_type>::value,
- "Scalar type used in MKLSparseMatrix<value_type> is NOT "
- "supported by MKL");
-
public:
inline MKLSparseMatrix(sparse_matrix_t mtx_) : mtx(mtx_) {}
// Constructs MKL sparse matrix from KK sparse views (m rows x n cols)
inline MKLSparseMatrix(const MKL_INT num_rows, const MKL_INT num_cols,
- MKL_INT *xadj, MKL_INT *adj, value_type *values);
+ MKL_INT *xadj, MKL_INT *adj, value_type *values) {
+ throw std::runtime_error(
+ "Scalar type used in MKLSparseMatrix<value_type> is NOT "
+ "supported by MKL");
+ }
// Allows using MKLSparseMatrix directly in MKL calls
inline operator sparse_matrix_t() const { return mtx; }
@@ -140,7 +140,11 @@ class MKLSparseMatrix {
// Exports MKL sparse matrix contents into KK views
inline void export_data(MKL_INT &num_rows, MKL_INT &num_cols,
MKL_INT *&rows_start, MKL_INT *&columns,
- value_type *&values);
+ value_type *&values) {
+ throw std::runtime_error(
+ "Scalar type used in MKLSparseMatrix<value_type> is NOT "
+ "supported by MKL");
+ }
inline void destroy() {
KOKKOSKERNELS_MKL_SAFE_CALL(mkl_sparse_destroy(mtx));
--
2.25.1
Can you post your configure?
Hi @csiefer2. Thanks for looking into this!
This is our configure:
It looks like GitHub doesn't allow .json files to be uploaded. So that's why I added .txt to the filename.
Hi @csiefer2. Thanks again for solving the companion issue about the Kokkos initialisation. Just wanted to ask if you should already have had a chance to look into this MKL issue? Thank you very much in advance!
@maartenarnst We don't test the OneAPI intel compilers, so I'm not at all surprised it doesn't work. I have a system with oneAPI installed, so I'll add MKL to that and see if I can get it to work.
Doesn't the class need a specialization for the int types? Seems this isn't a oneAPI issue, it's just enabling MKL Sparse matrices?
https://github.com/trilinos/Trilinos/blob/develop/packages/kokkos-kernels/src/sparse/KokkosSparse_Utils_mkl.hpp
@jjellio One of the ATDM tests uses MKL with Intel 19, but not with the KokkosKernels MKL interfaces, so that could be the thing.
Building now...
@maartenarnst I can reproduce now and it looks like a "something configure should have noticed and told you that you can't build this combination of options" kind of problem.
I'll try to get the @trilinos/kokkos-kernels folks in on this.
@lucbv agreed to take a look.
Thanks, to all of you, for looking into this!
The patch in the original post may go into the direction that @jjellio is suggesting. The patch is basically providing a default implementation for MKLSparseMatrix
. This default implementation throws runtime errors (replacing the static assert that is currently in the code). The specialisations for all the scalar types that MKL supports are left unchanged.
Just wanted to follow up on this issue. @lucbv, have you already had a chance to take a look at it? It seems the issue was also encountered in #11189. Thanks.
Hi, I am also facing the exact same issue with MKL and Intel oneAPI compilers. However I can add that it was working fine as of 74f5468
. Is there another workaround in the meantime?
Adding @brian-kelley
@lucbv have you had a chance to look at this? It is starting to become critical for @hkthorn.
@maartenarnst I replicated this issue and verified that your changes fix it. They look good to me; I opened #11278 with them.
@ndellingwood I put you as a reviewer on that. If you agree it looks good, I will open an equivalent PR against KokkosKernels develop so this can go in 3.7.01.
@brian-kelley thanks for the fast PR! Approved with automerge label applied
It appears that a few PRs have been merged related to this. Can this issue be closed or are there other required pieces?
@ccober6 Xyce testing of this change in the Trilinos development branch happens once Charon blesses SHA-1 tag through their Trilinos development branch testing. I will add a comment when the chain of events produces a clean Xyce build/test against Trilinos development so this can be closed. Is that okay?
@hkthorn Yes, let's make sure it works for you. What is the time frame until the Xyce build/test?
@ccober6 Our Trilinos development testing is up and running again. Thanks!!!
@hkthorn Yeah!! Glad this is working!