parsec
parsec copied to clipboard
Improve data reuse on GPU
Description
When the matrix can not be fitted into GPU memory, performance issues will occur. It's better to find a way to improve the data reuse on GPU, whether with more control or priority in runtime or a better eviction strategy in queues on GPU. For instance, we can make sure at least two of A, B, and C in the task of potrf_gemm are local.