STRUMPACK icon indicating copy to clipboard operation
STRUMPACK copied to clipboard

Does STRUMPACK support sparse LDLT factorization for symmetric matrix?

Open learning-chip opened this issue 3 years ago • 6 comments

From the Sparse Direct Solver section, it seems to use the general sparse LU factorization (when no compression is used). Is there a LDLT implementation for symmetric matrix (which would save ~half of compute&memory)? Thanks!

learning-chip avatar May 24 '22 08:05 learning-chip

Hi, No, unfortunately we don't (yet) support LDLT.

pghysels avatar May 25 '22 15:05 pghysels

Hi @pghysels I'd like to better understand the steps to develop symmetric LDLT from the existing general LU. Thanks!

From the book Multifrontal Methods Parallelism, Memory Usage and Numerical Aspects (written by MUMPS's author; MUMPS has BLR LU and LDLT implementations), there should be those differences at the abstract algorithm level:

  • The eliminate tree structure should be simpler: in the symmetric case, the transitive reduction of the graph of explicit dependencies is the symmetric elimination tree (Chapter 1.2.3 Elimination graph structures)
  • Need 2x2 pivot to maintain the symmetry of the frontal matrices (Chapter 1.3.2 Numerical accuracy and pivoting)
  • The call to BLAS kernel is different (Chapter 2.1.4 Factorization of type 1 nodes)

The above are for full-rank exact factorization. Regarding the low-rank compression of frontal matrices, I assume that the compression algorithm (RRQR) should stay exact the same (symmetric vs general)?

What I am not sure is how those textbook modifications map to Strumpack's code structure --

  • I would naively assume that the low-rank compression code (src/sparse/fronts/FrontalMatrix*.cpp) should stay the same for symmetric and nonsymmetric matrices?
  • The major changes should be maded in src/sparse/EliminationTree(MPI).hpp and the matrix container (src/sparse/CSRMatrix(MPI).cpp), I think?
  • I notice a symm_sparse bool attribute defined in CSRMatrix andCompressedSparseMatrix, but don't see how this tag is used during factorization

learning-chip avatar Jul 19 '22 11:07 learning-chip

STRUMPACK uses a symmetric pattern, so no changes would be needed in the symbolic phase. Since we use a symmetric pattern, we need to symmetrize the pattern before passing it to say METIS, or before doing the symbolic phase. The user can specify if the pattern is already symmetric with the symm_sparse input. But this doesn't really save much.

I think the main changes would just be to replace _getrf with _sytrf, and then replace the _trsm calls (I'm not sure by what). See here and the lines below that. Then the forward/backward solve needs to be updated accordingly. The code for distributed memory (calling ScaLAPACK p_getrf) needs to modified, as well as the GPU code.

Keeping the low rank algorithms symmetric will be more complicated I'm afraid.

There is also the extend-add phase which passes the Schur complement (22 block of the front) to the parent in the tree. This phase now assumes we have the whole front, and not just the lower/upper triangular part.

pghysels avatar Jul 20 '22 00:07 pghysels

Thanks for the explanation!

Keeping the low rank algorithms symmetric will be more complicated I'm afraid.

Could you elaborate more on this point?

From the MUMPS BLR paper, the only few comments on the symmetric BLR factorization are:

  • "Note that Algorithm 1 (Dense BLR LU factorization) can be easily adapted to the symmetric (definite and indefinite) case, where operations on U are not performed in order to take advantage of the symmetry."
  • "The flexibility of the BLR format makes it compatible with standard threshold partial pivoting, ... In the case of symmetric matrices, only diagonal pivoting is allowed in order to maintain symmetry, so that pivots must be part of AI,I ; two-by-two pivots are used when necessary."

At least for the BLR case, the modification looks minor? For the hierachical compressions involving tree structures (HSS, HODLR), the changes will be more complicated indeed.

learning-chip avatar Jul 20 '22 01:07 learning-chip

then replace the _trsm calls (I'm not sure by what).

I think trsm is the only LAPACK triangular solve routine -- the triangular factors should not be aware of the symmetry of the original matrix.

The one for LDLT should be sytrs, which works for both diagonal or 2x2 block diagonal D

learning-chip avatar Jul 20 '22 01:07 learning-chip

The problem is that dsytrf does not return the triangular factor. So we cannot use trsm directly. sytrs solves a system, so it solves with both the triangular factor and it's transpose.

pghysels avatar Jul 20 '22 20:07 pghysels