lapack
lapack copied to clipboard
New algorithms for computing Givens rotations
Closes #629
New algorithms for computing Givens rotations
@sergey-v-kuznetsov highlighted in #629 that the new Givens rotations operations may have lower accuracy than the ones that were in LAPACK up to release 3.9. This was verified after applying several rotations to a initial unitary matrix. This PR proposes:
- A new algorithm for computing complex Givens rotations.
- A slight modification in the algorithms for computing real-valued Givens rotations.
Both modifications target the improvement of the output's accuracy.
@langou and I are preparing a report with the numerical analysis and experiments comparing the different algorithms for computing Givens rotations. We will share the document here when it is ready to review.
Minor modifications
- Use
rtmin = sqrt( safmin )
instead ofrtmin = sqrt( safmin / epsilon )
. The first condition is sufficient to guarantee all real variables used in the intermediate steps of the new algorithm belong to the interval[safmin,safmax]
. - Set
rtmax
to eithersqrt( safmin/4 )
,sqrt( safmin/2 )
orsqrt( safmin )
. This variable depends on where it is in the algorithm. The value is the maximum possible in order that all real variables used in the intermediate steps of the new algorithm belong to the interval[safmin,safmax]
. - Eliminate the intermediate computations like
p = one / d
,uu = one / u
andvv = one / v
. These operations reduce the number of divisions in the code at the cost of possibly increasing the accumulation error. I am trying to improve accuracy, so I remove the intermediate operations at the cost of having additional floating-point divisions. - When
f = 0
, check ifreal(g) == 0
oraimag(g) == 0
to avoid unnecessaryABSSQ( g ); sqrt( g2 )
. This change reduces the accumulation error whenreal(g) == 0
(analogouslyaimag(g) == 0
) andaimag(g)**2
(analogouslyreal(g)**2
) cannot be stored in the respective finite precision. We choose not to use the intrinsic complexabs
because its implementation is compiler-dependent.
Major changes
The algorithm for computing complex Givens rotations was revisited. This is the new code in (c,z)ROTG
and (c,z)LARTG
for the unscaled part:
f2 = ABSSQ( f )
g2 = ABSSQ( g )
h2 = f2 + g2
! safmin <= f2 <= h2 <= safmax
if( f2 >= h2 * safmin ) then
! safmin <= f2/h2 <= 1, and h2/f2 is finite
c = sqrt( f2 / h2 )
r = f / c
rtmax = rtmax * 2
if( f2 > rtmin .and. h2 < rtmax ) then
! safmin <= sqrt( f2*h2 ) <= safmax
s = conjg( g ) * ( f / sqrt( f2*h2 ) )
else
s = conjg( g ) * ( r / h2 )
end if
else
! f2/h2 <= safmin may be subnormal, and h2/f2 may overflow.
! Moreover,
! safmin <= f2*f2 * safmax < f2 * h2 < h2*h2 * safmin <= safmax,
! sqrt(safmin) <= sqrt(f2 * h2) <= sqrt(safmax).
! Also,
! g2 >> f2, which means that h2 = g2.
d = sqrt( f2 * h2 )
c = f2 / d
if( c >= safmin ) then
r = f / c
else
! f2 / sqrt(f2 * h2) < safmin, then
! sqrt(safmin) <= f2 * sqrt(safmax) <= h2 / sqrt(f2 * h2) <= h2 * (safmin / f2) <= h2 <= safmax
r = f * ( h2 / d )
end if
s = conjg( g ) * ( f / d )
end if
- The worst-case scenario analysis shows this algorithm is more accurate than both the algorithms from 3.9.1 and 3.10.0.
- All real variables used in intermediate computations belong to the interval
[safmin,safmax]
.
Acknowledgements
Thanks to people that contributed in the discussions about this code:
- @sergey-v-kuznetsov, @ecanesc, @alilotfi90 and @langou,
- People from @BallisticLA team, especially prof. Jim Demmel.
Checklist
- [x] The documentation has been updated.
- [ ] Time measurements.
Codecov Report
Merging #631 (0c3cdcb) into master (79aa0f2) will not change coverage. The diff coverage is
0.00%
.
:exclamation: Current head 0c3cdcb differs from pull request most recent head 95b6e84. Consider uploading reports for the commit 95b6e84 to get more accurate results
@@ Coverage Diff @@
## master #631 +/- ##
=======================================
Coverage 0.00% 0.00%
=======================================
Files 1894 1894
Lines 184021 184035 +14
=======================================
- Misses 184021 184035 +14
Impacted Files | Coverage Δ | |
---|---|---|
SRC/cgeqrf.f | 0.00% <0.00%> (ø) |
|
SRC/cgerqf.f | 0.00% <0.00%> (ø) |
|
SRC/clarrv.f | 0.00% <0.00%> (ø) |
|
SRC/clartg.f90 | 0.00% <0.00%> (ø) |
|
SRC/dgeqrf.f | 0.00% <0.00%> (ø) |
|
SRC/dgerqf.f | 0.00% <0.00%> (ø) |
|
SRC/dlarrv.f | 0.00% <0.00%> (ø) |
|
SRC/dlartg.f90 | 0.00% <0.00%> (ø) |
|
SRC/sgeqrf.f | 0.00% <0.00%> (ø) |
|
SRC/sgerqf.f | 0.00% <0.00%> (ø) |
|
... and 6 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 79aa0f2...95b6e84. Read the comment docs.
From Jim: Do we know if the new clartg is consistent with what we proposed in section 2.3.5 of our exception handling document? Might be nice to avoid having yet another version in the future (or at least know what we need to change).
I have just updated the description (first comment) of this PR. @langou and I are preparing a report with the numerical analysis and experiments comparing the different algorithms for computing Givens rotations. We will share the document here when it is ready to review.
And here is the report: https://arxiv.org/abs/2211.04010. I am ready to merge it.