DLA-Future
DLA-Future copied to clipboard
Tridiagonal Solver (dist): Migrate permutation of local eigenvectors to GPU
In #967, a new "special" permutation has been added. In the end it is just a local permutation, but it starts from reasoning globally. Currently, it runs on MC for both MC and GPU variants of the tridiagonal solver. In order to get it run on GPU, we have two main ways:
In order to re-use the local permutation:
- we can "preprocess" the permutation array on Backend::MC extracting just local parts and convert global indices to local indices
- Problem: currently the permutation (local) can just deal with local matrices
- Option 1: use local indices to access the local part
- Option 2: create a new object (e.g. MatrixRef) that just refers to the local part (i.e. the new object does not feel anymore the distribution)
Permutation on GPU: Currently it is implemented passing a "simplified" distribution (pointer + horizontal and vertical distance between tiles)
- Since we are going to support "random" placed allocations
- (preferred) Option 1: send a vector of pointers, each element is the beginning of a tile
- (Option 2: force the layout on the matrix used)
- @rasolca does not like how the position is currently computed
- It is going to be implemented differently (currently a CUDA thread works on a single element)
- cudaMemcpy is not an alternative since it would spawn too many small kernels