PySPQR icon indicating copy to clipboard operation
PySPQR copied to clipboard

Idea for a potential future direction: SuiteSparse's GPU backend?

Open cleo2801 opened this issue 2 months ago • 2 comments

Hi,

First off, thanks for creating this incredibly useful library!

I was recently using sparseqr and checking the latest SuiteSparse documentation. I noticed that their SPQR solver now has a mature GPU backend, which caught my attention because I need high-resolution computations and they are currently taking a lot of time.

The current GPU alternatives in Python (like cupy.sparse.linalg.qr) are great for speed, but they don’t expose the column permutation vector E. In my work, that vector is actually the most critical piece of the output. Having a way to run this exact logic on a GPU would be a real game-changer for performance.

I understand this is not a simple feature request. I can imagine the amount of work required in the CFFI layer to handle GPU memory, device transfers, and linking against a CUDA-enabled SuiteSparse build, but I wanted to share this idea and see if GPU support for SPQR is something you’ve ever considered for sparseqr.

Thanks again for all your work!

cleo2801 avatar Oct 16 '25 13:10 cleo2801

I haven't looked into this, but it may be easy to add. Are you saying that there are CUDA-enabled SuiteSparse builds? I would assume that they expose a C function call for this.

yig avatar Oct 31 '25 17:10 yig

Hi yig,

Thanks for the reply.

I did a bit more checking. It looks like the CUDA-specific functions (SuiteSparseQR_cuda) expect pointers to device (GPU) memory, not host (CPU) memory. If that's the case, it probably means the CFFI layer would need to be updated to manually manage the data transfers: copying the matrix to the GPU, calling the function, and copying the results back.

I guess that, this, combined with the complex build requirement (needing a CUDA-enabled SuiteSparse), makes it a much bigger task than a simple function swap.

Just wanted to share what I found. Thanks again!

cleo2801 avatar Nov 04 '25 09:11 cleo2801

The library is opened dynamically. It may be possible to check if the function exists and then import pycuda to allocate GPU memory. That would be a relatively small task.

yig avatar Nov 12 '25 16:11 yig