Fix the behavior of extent(r) for r > rank in TeamGEMM
May partially resolve #2622
After looking at the bahaviors of older versions, I found that
extent(r) for r > rank should return 1
This PR fixes the behavior.
I don't know the precedence of || and ternary offhand, could you please insert parens (even if not technically necessary)
Sure
Can we add a small unit-test that exhibit the current issue and shows that the proposed PR fixes it?
If this is related, it means that there is a GEMM operation on rank 0 view. I can add a test for that in https://github.com/kokkos/kokkos-kernels/pull/2628
I'm building the Ifpack2 tests to check the impact of this PR, thanks for tracking this down @yasahi-hpc
Unfortunately I still see the failures in Ifpack2_BlockTriDiContainerUnitAndPerfTests_MPI_4 when building Trilinos with this PR, tested in a Cuda build
Unfortunately I still see the failures in
Ifpack2_BlockTriDiContainerUnitAndPerfTests_MPI_4when building Trilinos with this PR, tested in a Cuda build
Thanks a lot for testing and reporting. That is a bit unfortunate. I will continue investigation
Hi @yasahi-hpc , I retested with 9f1b00a904b44828cefdcd54f9f7c46908c7b27a in a Cuda build but I am still seeing failures with the Ifpack2_BlockTriDiContainerUnitAndPerfTests_MPI_4 test
Hi @yasahi-hpc , I retested with 9f1b00a in a Cuda build but I am still seeing failures with the
Ifpack2_BlockTriDiContainerUnitAndPerfTests_MPI_4test
Thank you for testing. That is unfortunate.
I will investigate on my side. First of all, I need to build Trilinos on my environment.
@lucbv Can I close this and start working again #2628 and #2651 ?
Yes, I think we can, sorry we took a while to make the final decision on how to move forward. The code changes in GEMM led to issues with CUDA and HIP so we might want to re-introduce things more slowly, with more smaller PRs...
Thank you for the information So, better not to work on #2628 and #2651