KokkosBlas::gemm operates incorrectly in corner case
https://github.com/trilinos/Trilinos/issues/9583 is seeing incorrect results from MultiVector multiply due to an error in KokkosBlas::gemm.
KokkosBlas::gemm exits early when the input A has no entries. See https://github.com/kokkos/kokkos-kernels/blob/00189c0be23a70979aeaa162f0abd4c0e4d1c479/src/blas/KokkosBlas3_gemm.hpp#L142
But it exits before multiplying C by beta. If C is not empty, gemm should multiply C by beta before the early exit.
This use case arises when A has zero entries on some processor, and C is locally replicated on all processors. On the empty processor, C's values do not get multiplied by beta as they should.
@ndellingwood @brian-kelley Looks like Karen suggested a fix already (commented out early exit). Can you make sure this is the final fix and push it into repo here (and Trilinos) if it is not pushed.
I proposed a fix that keeps some quick returns based on input dimensions but it hopefully encompasses the corner case of empty A/B with non-empty C matrix better.
This was fixed by PR #1091 , it now needs to be ported to Trilinos.
@lucbv What is the target date for porting the fix to Trilinos? Once it is ported, we can merge my reproducer https://github.com/trilinos/Trilinos/pull/9819 . Thanks.
I was waiting a bit to see if the Kokkos/Kokkos Kernels release would happen quickly and would automatically take care of this. Let me check tomorrow and if it's not likely to happen this week I will create a Trilinos PR to fix the issue.