Trilinos icon indicating copy to clipboard operation
Trilinos copied to clipboard

Tpetra&KokkosKernels: Debug mode checks on SpGEMM fail

Open cgcgcg opened this issue 1 year ago • 8 comments

@trilinos/kokkos-kernels @trilinos/tpetra

Bug Report

In debug builds Kokkos Kernels recently added checks that the input matrices are sorted. Some TPLs require that, but currently Tpetra SpGEMM cannot use them.

When computing A*B on multiple ranks, Tpetra imports B into the colMap of A. The resulting Bmerged is not necessarily sorted and hence triggers the assert in Kokkos Kernels.

One such example is the test PanzerMiniEM_MiniEM-BlockPrec_RefMaxwell_reuse_MPI_4. I verified that in this case both A and B are sorted, but Bmerged is not.

cgcgcg avatar Mar 10 '23 21:03 cgcgcg

Some TPLs require that, but currently Tpetra SpGEMM cannot use them.

@cgcgcg By this, you mean that Tpetra's SpGEMM is unable to take advantage of sorted indices?

jhux2 avatar Mar 10 '23 23:03 jhux2

It seems that whenever Tpetra does an SpGEMM, no matter what TPLs are enabled, we end up with the KokkosKernels implementation (due to wrong type for offsets). Kokkos Kernels does not require sorted indices, only some TPLs do.

cgcgcg avatar Mar 10 '23 23:03 cgcgcg

@jhux2 It's because Tpetra CrsMatrix uses offset = size_t for the local matrix type, but all 3 TPLs only work when offset = int (offset being the rowptrs element type).

I'm thinking of adding the code in Tpetra MatrixMatrix to convert the offsets to int explicitly so that the TPLs can be called (just for performance).

brian-kelley avatar Mar 10 '23 23:03 brian-kelley

It seems that whenever Tpetra does an SpGEMM, no matter what TPLs are enabled, we end up with the KokkosKernels implementation (due to wrong type for offsets).

It's because Tpetra CrsMatrix uses offset = size_t for the local matrix type, but all 3 TPLs only work when offset = int (offset being the rowptrs element type).

@cgcgcg @brian-kelley Gotcha, makes sense.

jhux2 avatar Mar 10 '23 23:03 jhux2

@rppawlo Can you confirm that #11663 fixed this for EMPIRE?

cgcgcg avatar Mar 13 '23 16:03 cgcgcg

@cgcgcg - yes. This fixes the issues! Thanks!

rppawlo avatar Mar 13 '23 22:03 rppawlo

@brian-kelley I guess we can leave the issue open to track that something should still happen on the Tpetra side.

cgcgcg avatar Mar 13 '23 22:03 cgcgcg

This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity. If you would like to keep this issue open please add a comment and/or remove the MARKED_FOR_CLOSURE label. If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE. If it is ok for this issue to be closed, feel free to go ahead and close it. Please do not add any comments or change any labels or otherwise touch this issue unless your intention is to reset the inactivity counter for an additional year.

github-actions[bot] avatar Mar 13 '24 12:03 github-actions[bot]

This issue was closed due to inactivity for 395 days.

github-actions[bot] avatar Apr 13 '24 12:04 github-actions[bot]