lapack icon indicating copy to clipboard operation
lapack copied to clipboard

LAPACK slaln2.f: more accuracy hurts if not consistent

Open langou opened this issue 2 years ago • 0 comments

This is a bug report from David Hough

Subject: LAPACK slaln2.f: more accuracy hurts if not consistent Date: Wed, 29 Oct 2014 15:29:20 -0700 (PDT)

Another example turned up by enabling fused FMA operations to replace multiply/add pairs at the compiler's discretion.

slaln2.f solves 1x1 or 2x2 systems of equations. Couldn't be simpler, right?

http://www.netlib.org/lapack/explore-html/da/d2d/slaln2_8f.html

But its invocation from the TESTING programs schkee.f/schkec.f/sget31.f and the test input data file sec.in gets into trouble with too-high residuals computed in sget31.f. Since slaln2.f uses total pivoting, how can it possibly have problems with matrices of dimension <= 2?

After all, slaln2.f computes the answers with code like:

298 * Real 1x1 system. 299 * 300 * C = ca A - w D 301 * 302 csr = caa( 1, 1 ) - wrd1 ... 323 x( 1, 1 ) = ( b( 1, 1 )*scale ) / csr

and

327 * Complex 1x1 system (w is complex) 328 * 329 * C = ca A - w D 330 * 331 csr = caa( 1, 1 ) - wrd1 332 csi = -wid1 ... 354 CALL sladiv( scaleb( 1, 1 ), scale*b( 1, 2 ), csr, csi, 355 $ x( 1, 1 ), x( 1, 2 ) )

and sget31 checks them with code like:

221 res = abs( ( caa( 1, 1 )-wrd1 )* 222 $ x( 1, 1 )-scale*b( 1, 1 ) )

279 res = abs( ( caa( 1, 1 )-wrd1 )* 280 $ x( 1, 1 )+( wid1 )x( 1, 2 )- 281 $ scaleb( 1, 1 ) ) 282 res = res + abs( ( -wid1 )x( 1, 1 )+ 283 $ ( caa( 1, 1 )-wr*d1 )x( 1, 2 )- 284 $ scaleb( 1, 2 ) )

A very straighforward residual computation: to check x = bs / c compute cx - b*s

The problems arise when higher precision is used capriciously, as when only one of sget31 or slaln2 replaces the ca-wd with an FMA, because they are separately compiled, perhaps with different compiler options.

Then the computed residual is nowhere near where it should be if ca-wd mostly cancels in single precision. And cchkec.f has this to say about that:

Error in SLALN2: RMAX = 0.357E+07 LMAX = 60625 NINFO= 0 71553 KNT= 230400

My working solution - again not canonical LAPACK because it uses higher precision - is to promote the candidates for fusing to higher precision: In slaln2:

< CSR = CAA( 1, 1 ) - WRD1

CSR = dble(CA)*dble(A( 1, 1 )) - dble(WR)*dble(D1)

< CSR = CAA( 1, 1 ) - WRD1

CSR = dble(CA)*dble(A( 1, 1 )) - dble(WR)*dble(D1)

In sget31:

< RES = ABS( ( CAA( 1, 1 )-WRD1 )* < $ X( 1, 1 )-SCALE*B( 1, 1 ) )

RES = ABS( ( dble(CA)dble(A( 1, 1 ))-dble(WR)dble(D1 )) $ (X( 1, 1 ))-(SCALE)(B( 1, 1 )) )

< RES = ABS( ( CAA( 1, 1 )-WRD1 )* < $ X( 1, 1 )+( WID1 )X( 1, 2 )- < $ SCALEB( 1, 1 ) ) < RES = RES + ABS( ( -WID1 )X( 1, 1 )+ < $ ( CAA( 1, 1 )-WR*D1 )X( 1, 2 )- < $ SCALEB( 1, 2 ) )

RES = ABS( x ( dble(CA)dble(A( 1, 1 ))-dble(WR)dble(D1) ) X( 1, 1 )+ x ( (WI)(D1) )(X( 1, 2 ))- SCALEB( 1, 1 ) x ) RES = RES + ABS( $ ( dble(CA)dble(A( 1, 1 ))-dble(WR)dble(D1) )X( 1, 2 )+ x (-(WI)(D1) )(X( 1, 1 ))- SCALEB( 1, 2 ) x )

That seems to do the trick. Interestingly, in my test environment, corresponding bugs did not come up in the [cdz] versions of the same functions.

It's not a question of whether higher precision is used. It's a question of whether higher precision is used consistently. In a world where fusing multiple-add is mostly advantageous, or at least harmless, and is either on or off globally, sensitive code needs to be protected.

The right solution is to have ways of indicating locally within the program text where higher precision or other value-changing optimizations should be avoided. Those places are relatively few, but hard to find after the fact. However the right solution awaits a more cooperative attitude from language designers and implementers.

langou avatar Jul 15 '21 15:07 langou