lapack
lapack copied to clipboard
Add level-3 BLAS triangular Sylvester equation solver
Description
Currently the only way to solve the triangular Sylvester equation AX + XB = C is via dtrsyl, which solely relies in level-2 BLAS. This PR intends to contribute a level-3 BLAS version.
The triangular Sylvester equation has been recognized to be prone to overflow. For that purpose, dtrsyl utilizes a scaling factor to represent the solution as (scale^{-1} * X) and solve the scaled equation AX + XB = scale * C. Due to the scaling factor, there is some flexibility in the representation of the solution. The proposed level-3 BLAS version computes the scaling factors based on upper bounds of blocks to enable level-3 BLAS. The scaling is typically slightly more aggressive so that an alternatively scaled final solution is computed. This is no problem as long as the scaling factor does not get flushed to zero. Indeed, the assumption in Anderson's original paper on solving scaled systems is that scale is in (0,1] and not [0,1].
In experiments the flushing occasionally happens for both dtrsyl and the proposed solver dtrsyl3, but dtrsyl3 reaches the stage earlier. In view of the intended changes of the handling of NaN, inf etc. I would be grateful to get some feedback on how to address the problem.
Our paper proposes to use integer scaling factors of the form 2^{(intscale} and solve AX + XB = 2^{intscale} * C. The vastly increased representational range rules out any flushing during the computation. The problem with the scaling factors affects dlatrs and dtrevc, too. There may be more routines. Is there an interest in using integer scaling factors and changing the function signature/documentation?
Alternatively, I would revise the PR and add a workaround for the flushing problem present in the attached code.
Once the decision on the scalings is taken, I will fix the other data types.
Checklist
- [ ] The documentation has been updated.
- [ ] If the PR solves a specific issue, it is set to be closed on merge.
Codecov Report
Base: 0.00% // Head: 0.00% // No change to project coverage :thumbsup:
Coverage data is based on head (
d83c3fd) compared to base (801ac2f). Patch coverage: 0.00% of modified lines in pull request are covered.
:exclamation: Current head d83c3fd differs from pull request most recent head 970a772. Consider uploading reports for the commit 970a772 to get more accurate results
Additional details and impacted files
@@ Coverage Diff @@
## master #651 +/- ##
=========================================
Coverage 0.00% 0.00%
=========================================
Files 1894 1904 +10
Lines 184078 186552 +2474
=========================================
- Misses 184078 186552 +2474
| Impacted Files | Coverage Δ | |
|---|---|---|
| SRC/clatrs3.f | 0.00% <0.00%> (ø) |
|
| SRC/ctrsyl3.f | 0.00% <0.00%> (ø) |
|
| SRC/dlarmm.f | 0.00% <0.00%> (ø) |
|
| SRC/dlatrs3.f | 0.00% <0.00%> (ø) |
|
| SRC/dtrsyl3.f | 0.00% <0.00%> (ø) |
|
| SRC/ilaenv.f | 0.00% <0.00%> (ø) |
|
| SRC/slarmm.f | 0.00% <0.00%> (ø) |
|
| SRC/slatrs3.f | 0.00% <0.00%> (ø) |
|
| SRC/strsyl3.f | 0.00% <0.00%> (ø) |
|
| SRC/zlatrs3.f | 0.00% <0.00%> (ø) |
|
| ... and 1 more |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
DLATRS3 Performance results originally presented at the BLIS retreat, and augmented with results for reference BLAS

DTRSYL3

The two solvers are a drop-in replacement of the existing BLAS-2 routines and dedicate quite a bit of code to computing a similar result as the BLAS-2 counterpart.The question on whether or not the scale factor should be of the same data type as the other data structures remains. The advantage of moving to the integer scale factor alpha = 2**j is that flushing of the scale factor to zero can no longer happen as a result of limitations of the floating-point range.
I would keep the representation by the BLAS-2 and the BLAS-3 version compatible. Then the disadvantage is a change of the function signature of LATRS and TRSYL. One option is to revisit the code later when an upgrade to LAPACK 4.0 is due and then switch the representation of the solution from (1/scale) * x to (1/(2**scale)) * x,where scale is int, later.