lapack icon indicating copy to clipboard operation
lapack copied to clipboard

New algorithms for computing Givens rotations

Open weslleyspereira opened this issue 2 years ago • 3 comments

Closes #629

New algorithms for computing Givens rotations

@sergey-v-kuznetsov highlighted in #629 that the new Givens rotations operations may have lower accuracy than the ones that were in LAPACK up to release 3.9. This was verified after applying several rotations to a initial unitary matrix. This PR proposes:

  1. A new algorithm for computing complex Givens rotations.
  2. A slight modification in the algorithms for computing real-valued Givens rotations.

Both modifications target the improvement of the output's accuracy.

@langou and I are preparing a report with the numerical analysis and experiments comparing the different algorithms for computing Givens rotations. We will share the document here when it is ready to review.

Minor modifications

  1. Use rtmin = sqrt( safmin ) instead of rtmin = sqrt( safmin / epsilon ). The first condition is sufficient to guarantee all real variables used in the intermediate steps of the new algorithm belong to the interval [safmin,safmax].
  2. Set rtmax to either sqrt( safmin/4 ), sqrt( safmin/2 ) or sqrt( safmin ). This variable depends on where it is in the algorithm. The value is the maximum possible in order that all real variables used in the intermediate steps of the new algorithm belong to the interval [safmin,safmax].
  3. Eliminate the intermediate computations like p = one / d, uu = one / u and vv = one / v. These operations reduce the number of divisions in the code at the cost of possibly increasing the accumulation error. I am trying to improve accuracy, so I remove the intermediate operations at the cost of having additional floating-point divisions.
  4. When f = 0, check if real(g) == 0 or aimag(g) == 0 to avoid unnecessary ABSSQ( g ); sqrt( g2 ). This change reduces the accumulation error when real(g) == 0 (analogously aimag(g) == 0) and aimag(g)**2 (analogously real(g)**2) cannot be stored in the respective finite precision. We choose not to use the intrinsic complex abs because its implementation is compiler-dependent.

Major changes

The algorithm for computing complex Givens rotations was revisited. This is the new code in (c,z)ROTG and (c,z)LARTG for the unscaled part:

f2 = ABSSQ( f )
g2 = ABSSQ( g )
h2 = f2 + g2
! safmin <= f2 <= h2 <= safmax 
if( f2 >= h2 * safmin ) then
    ! safmin <= f2/h2 <= 1, and h2/f2 is finite
    c = sqrt( f2 / h2 )
    r = f / c
    rtmax = rtmax * 2
    if( f2 > rtmin .and. h2 < rtmax ) then
        ! safmin <= sqrt( f2*h2 ) <= safmax
        s = conjg( g ) * ( f / sqrt( f2*h2 ) )
    else
        s = conjg( g ) * ( r / h2 )
    end if
else
    ! f2/h2 <= safmin may be subnormal, and h2/f2 may overflow.
    ! Moreover,
    !  safmin <= f2*f2 * safmax < f2 * h2 < h2*h2 * safmin <= safmax,
    !  sqrt(safmin) <= sqrt(f2 * h2) <= sqrt(safmax).
    ! Also,
    !  g2 >> f2, which means that h2 = g2.
    d = sqrt( f2 * h2 )
    c = f2 / d
    if( c >= safmin ) then
        r = f / c
    else
        ! f2 / sqrt(f2 * h2) < safmin, then
        !  sqrt(safmin) <= f2 * sqrt(safmax) <= h2 / sqrt(f2 * h2) <= h2 * (safmin / f2) <= h2 <= safmax
        r = f * ( h2 / d )
    end if
    s = conjg( g ) * ( f / d )
end if
  • The worst-case scenario analysis shows this algorithm is more accurate than both the algorithms from 3.9.1 and 3.10.0.
  • All real variables used in intermediate computations belong to the interval [safmin,safmax].

Acknowledgements

Thanks to people that contributed in the discussions about this code:

  • @sergey-v-kuznetsov, @ecanesc, @alilotfi90 and @langou,
  • People from @BallisticLA team, especially prof. Jim Demmel.

Checklist

  • [x] The documentation has been updated.
  • [ ] Time measurements.

weslleyspereira avatar Oct 15 '21 20:10 weslleyspereira

Codecov Report

Merging #631 (0c3cdcb) into master (79aa0f2) will not change coverage. The diff coverage is 0.00%.

:exclamation: Current head 0c3cdcb differs from pull request most recent head 95b6e84. Consider uploading reports for the commit 95b6e84 to get more accurate results Impacted file tree graph

@@           Coverage Diff           @@
##           master     #631   +/-   ##
=======================================
  Coverage    0.00%    0.00%           
=======================================
  Files        1894     1894           
  Lines      184021   184035   +14     
=======================================
- Misses     184021   184035   +14     
Impacted Files Coverage Δ
SRC/cgeqrf.f 0.00% <0.00%> (ø)
SRC/cgerqf.f 0.00% <0.00%> (ø)
SRC/clarrv.f 0.00% <0.00%> (ø)
SRC/clartg.f90 0.00% <0.00%> (ø)
SRC/dgeqrf.f 0.00% <0.00%> (ø)
SRC/dgerqf.f 0.00% <0.00%> (ø)
SRC/dlarrv.f 0.00% <0.00%> (ø)
SRC/dlartg.f90 0.00% <0.00%> (ø)
SRC/sgeqrf.f 0.00% <0.00%> (ø)
SRC/sgerqf.f 0.00% <0.00%> (ø)
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 79aa0f2...95b6e84. Read the comment docs.

codecov[bot] avatar Oct 15 '21 20:10 codecov[bot]

From Jim: Do we know if the new clartg is consistent with what we proposed in section 2.3.5 of our exception handling document? Might be nice to avoid having yet another version in the future (or at least know what we need to change).

langou avatar Oct 17 '21 02:10 langou

I have just updated the description (first comment) of this PR. @langou and I are preparing a report with the numerical analysis and experiments comparing the different algorithms for computing Givens rotations. We will share the document here when it is ready to review.

weslleyspereira avatar Dec 13 '21 20:12 weslleyspereira

And here is the report: https://arxiv.org/abs/2211.04010. I am ready to merge it.

weslleyspereira avatar Nov 09 '22 16:11 weslleyspereira