STRUMPACK icon indicating copy to clipboard operation
STRUMPACK copied to clipboard

GPU Support

Open artv3 opened this issue 5 years ago • 4 comments

Hi,

I was wondering if GPU support (Nvidia/AMD) is on the roadmap?

artv3 avatar Jul 09 '20 15:07 artv3

Hi,

We have some initial support for Nvidia GPUs, using cuBLAS/cuSOLVER and SLATE for one of the solver algorithms (for the direct solver, not for the preconditioners or the iterative solvers). In an older branch, we also used MAGMA (instead of cuBLAS/cuSOLVER), because they provide variable sized batched GEMM.

There are currently no CUDA kernels in the code, since we can get most of the performance from BLAS/LAPACK. However, I'm planning to add a few small CUDA kernels to avoid CPU<->GPU data movement, which would also make it easier to remove our use of CUDA managed memory (cudaMallocManaged). That will then make it easier to 'hipify' the code.

To test the current cuBLAS/cuSOLVER support, add -DTPL_ENABLE_CUBLAS=ON -DCUDAToolkit_ROOT=/some/path to your cmake command, and -DTPL_ENABLE_SLATE=ON for SLATE support for distributed memory (ScaLAPACK alternative).

I'd be happy to discuss more.

pghysels avatar Jul 09 '20 16:07 pghysels

Are you working on some application or are you thinking about RAJA?

pghysels avatar Jul 14 '20 17:07 pghysels

I was just thinking about it from a user point of view. If my application assembles a matrix how does the hand off to STRUMPACK work? For example, can I provide device pointers? or would it do the host-device copy for me?

artv3 avatar Jul 16 '20 22:07 artv3

We don't have an interface that takes device pointers. There are several steps in the algorithms that we cannot do on the GPU yet, such as reordering of the (sparse) matrix to reduce the fill-in during the factorization, or reordering for numerical stability, and the symbolic analysis phase. For the ordering we call external libraries such as METIS, Scotch and MC64.

pghysels avatar Jul 28 '20 17:07 pghysels