DLA-Future TriSolver (dist): move sorting permutation from CPU to GPU

TriSolver (dist): move sorting permutation from CPU to GPU

Open albestro opened this issue 10 months ago • 4 comments

This PR aims at dropping the custom permuteJustLocal and reduce the use-case, by transforming permutation indices, to be manageable with the existing local permutation implementation, that exists for both backends.

[ ] Cleanup implementation
[ ] It might be possible to drop i5 (for distributed implementation)
[ ] What to do about permute API? Should we separate the "distributed" use case (at least formally) or is it enough reviewing assumptions?
[ ] Evaluate if it is worth switching to MatrixRef (just for the code changed)
[x] Make it work on GPU
[ ] Add a unit test for the new use-case with distributed matrices

Notes

From PR #967 each rank sort eigenvalues by type (upper, dense, lower, deflated) independently from other ranks. At the time of that PR, for convenience reasons, we opted for performing the sort with a custom permutation procedure permuteJustLocal that were able to deal with global indices but just apply the permutation to the local part. In addition to this, permuteJustLocal was implemented just on CPU because on GPU it would had required a major effort not worth due to the inherently GPU inefficient type of operations.

Apr 09 '24 10:04 albestro

cscs-ci run

Apr 09 '24 10:04 albestro

cscs-ci run

Apr 09 '24 12:04 albestro

cscs-ci run

Apr 09 '24 15:04 albestro

cscs-ci run

Apr 16 '24 16:04 albestro

cscs-ci run

May 27 '24 08:05 albestro

cscs-ci run

May 27 '24 08:05 albestro

cscs-ci run

May 27 '24 09:05 albestro

cscs-ci run

May 27 '24 10:05 albestro

cscs-ci run

May 28 '24 10:05 rasolca

Back to draft due to frequent hangs on santis

May 29 '24 10:05 rasolca

cscs-ci run

Jun 21 '24 08:06 rasolca

DLA-Future DLA-Future copied to clipboard

TriSolver (dist): move sorting permutation from CPU to GPU

Notes

DLA-Future
DLA-Future copied to clipboard