DLA-Future
DLA-Future copied to clipboard
TriSolver (dist): move sorting permutation from CPU to GPU
This PR aims at dropping the custom permuteJustLocal
and reduce the use-case, by transforming permutation indices, to be manageable with the existing local permutation implementation, that exists for both backends.
- [ ] Cleanup implementation
- [ ] It might be possible to drop i5 (for distributed implementation)
- [ ] What to do about
permute
API? Should we separate the "distributed" use case (at least formally) or is it enough reviewing assumptions? - [ ] Evaluate if it is worth switching to
MatrixRef
(just for the code changed) - [x] Make it work on GPU
- [ ] Add a unit test for the new use-case with distributed matrices
Notes
From PR #967 each rank sort eigenvalues by type (upper, dense, lower, deflated) independently from other ranks. At the time of that PR, for convenience reasons, we opted for performing the sort with a custom permutation procedure permuteJustLocal
that were able to deal with global indices but just apply the permutation to the local part. In addition to this, permuteJustLocal
was implemented just on CPU because on GPU it would had required a major effort not worth due to the inherently GPU inefficient type of operations.
cscs-ci run
cscs-ci run
cscs-ci run
cscs-ci run
cscs-ci run
cscs-ci run
cscs-ci run
cscs-ci run
cscs-ci run
Back to draft due to frequent hangs on santis
cscs-ci run