lapack
lapack copied to clipboard
dlasy2.f: substitution for small pivots does not always help
Bug report from David Hough
Subject: dlasy2.f: substitution for small pivots does not always help Date: Fri, 31 Oct 2014 13:50:08 -0700 (PDT)
The LAPACK TESTING functions
dchkee.f ddrves.f dlatme.f dget24.f found an interesting problem at the bottom of the calling chain
dgees.f dgeesx.f dtrsen.f dtrexc.f dlaexc.f dlarfx.f dlarfg.f dlasy2.f
The error message way up at the top was:
DDRVES: DGEES1 returned INFO= 6.
and
DGET24: DGEESX1 returned INFO= 6.
Oddly no corresponding problems reported for [scz].
The problem input was ded.in
After a lot of print statements I discovered that INFO was actually being set in dlaexc.f because the solution to a certain linear equation was not satisfying the equation to anything near the precision to be expected from double precision.
- Compute machine-dependent threshold for test for accepting
- swap.
EPS = DLAMCH( 'P' ) SMLNUM = DLAMCH( 'S' ) / EPS THRESH = MAX( TENEPSDNORM, SMLNUM ) *
- Solve T11X - XT22 = scale*T12 for X.
CALL DLASY2( .FALSE., .FALSE., -1, N1, N2, D, LDD, $ D( N1+1, N1+1 ), LDD, D( 1, N1+1 ), LDD, SCALE, X, $ LDX, XNORM, IERR ) *
- Swap the adjacent diagonal blocks.
...
- N1 = 2, N2 = 2: generate elementary reflectors H(1) and H(2) so
- that:
- H(2) H(1) ( -X11 -X12 ) = ( * * )
- ( -X21 -X22 ) ( 0 * )
- ( scale 0 ) ( 0 0 )
- ( 0 scale ) ( 0 0 )
U1( 1 ) = -X( 1, 1 ) U1( 2 ) = -X( 2, 1 ) U1( 3 ) = SCALE CALL DLARFG( 3, U1( 1 ), U1( 2 ), 1, TAU1 ) U1( 1 ) = ONE * TEMP = -TAU1*( X( 1, 2 )+U1( 2 )X( 2, 2 ) ) U2( 1 ) = -TEMPU1( 2 ) - X( 2, 2 ) U2( 2 ) = -TEMP*U1( 3 ) U2( 3 ) = SCALE CALL DLARFG( 3, U2( 1 ), U2( 2 ), 1, TAU2 ) U2( 1 ) = ONE *
- Perform swap provisionally on diagonal block in D.
CALL DLARFX( 'L', 3, 4, U1, TAU1, D, LDD, WORK ) CALL DLARFX( 'R', 4, 3, U1, TAU1, D, LDD, WORK ) CALL DLARFX( 'L', 3, 4, U2, TAU2, D( 2, 1 ), LDD, WORK ) CALL DLARFX( 'R', 4, 3, U2, TAU2, D( 1, 2 ), LDD, WORK ) *
- Test whether to reject swap.
IF( MAX( ABS( D( 3, 1 ) ), ABS( D( 3, 2 ) ), ABS( D( 4, 1 ) ), $ ABS( D( 4, 2 ) ) ).GT.THRESH )GO TO 50
So here we are... we computed X with a call to dlasy2, and after some reflections we check whether the stuff computed from X that should be zero is close enough - relative to THRESH. In the case in question, the test fails because d31 and d32 are too big:
-2.328823393046896E+127 6.846329887184472E+126
relative to THRESH 3.8066038479327334E+123 which is derived from the norm of the original data DNORM 1.7143419671096954E+138. THRESH looks like a pretty good criterion for distinguishing normal roundoff from something else.
Many print statements later, I discovered that the root cause of dlaexc's discomfort was that dlasy2 did not compute the solution X correctly - not even close. Why was that?
There is no extended precision or FMA involved. I found no uninitialized variables affecting matters. There are no IEEE exceptions except inexact and some harmless underflows in this kind of test in dlasy2 to determine whether scaling is required:
IF( ( EIGHT*SMLNUM )ABS( BTMP( 1 ) ).GT.ABS( T16( 1, 1 ) ) .OR. $ ( EIGHTSMLNUM )ABS( BTMP( 2 ) ).GT.ABS( T16( 2, 2 ) ) .OR. $ ( EIGHTSMLNUM )ABS( BTMP( 3 ) ).GT.ABS( T16( 3, 3 ) ) .OR. $ ( EIGHTSMLNUM )*ABS( BTMP( 4 ) ).GT.ABS( T16( 4, 4 ) ) ) THEN
If SMLNUM and BTMP(i) are both very small then their product might underflow, but it doesn't much matter; no rescaling is required on their behalf and the test fails.
So how can solving a 4x4 real linear equation system in double precision with complete pivoting and no arithmetic exceptions possibly fail? Only if it's singular?
Many more print statements later, I discovered that the critical step came at the end of the LU factorization. At each step the pivot is checked for smallness and if too small, a substitute is used instead. If indeed the pivot were dominated by roundoff then one little number might be as good as another. But one of the lessons I remember from Berkeley days is that roundoff in Gaussian elimination is not all random, and in fact is highly correlated... in a way that one ignores at one's peril.
However dlasy2 checks for tiny pivots:
- 2 by 2:
- op[TL11 TL12][X11 X12] +ISGN [X11 X12]*op[TR11 TR12] = [B11 B12]
- [TL21 TL22] [X21 X22] [X21 X22] [TR21 TR22] [B21 B22]
- Solve equivalent 4 by 4 system using complete pivoting.
- Set pivots less than SMIN to SMIN.
50 CONTINUE SMIN = MAX( ABS( TR( 1, 1 ) ), ABS( TR( 1, 2 ) ), $ ABS( TR( 2, 1 ) ), ABS( TR( 2, 2 ) ) ) SMIN = MAX( SMIN, ABS( TL( 1, 1 ) ), ABS( TL( 1, 2 ) ), $ ABS( TL( 2, 1 ) ), ABS( TL( 2, 2 ) ) ) SMIN = MAX( EPS*SMIN, SMLNUM )
So SMIN 2.3521337536917765E+122, is roundoff times the maximum magnitude 1.0593068696651851E+138 in the input data; here the overflow protection SMLNUM 1.0020841800044864E-292 doesn't enter the picture.
Thus SMIN looks like a reasonable criterion for roundoff domination in subsequent results. It's checked for each pivot of gaussian elimination:
- Perform elimination
DO 100 I = 1, 3 ... IF( ABS( T16( I, I ) ).LT.SMIN ) THEN INFO = 1 T16( I, I ) = SMIN END IF 100 CONTINUE ... IF( ABS( T16( 4, 4 ) ).LT.SMIN ) $ T16( 4, 4 ) = SMIN
A couple of odd points: INFO is set when substitutions are made for any pivot except the last. However dlaexc doesn't check the value of INFO anyway. I wonder why...
So that's the program. Here's the input data as a 4x4 double precision matrix to be factored:
1 t16 before pivoting 1 1 0.150+131 -0.385+132 0.482+122 0.000E+00 1 2 0.599+129 0.150+131 0.000E+00 0.482+122 1 3 -0.106+139 0.000E+00 0.150+131 -0.385+132 1 4 0.000E+00 -0.106+139 0.599+129 0.150+131
That's quite a range of magnitudes - the test program driver did that on purpose. Pivoting is based on the total remaining submatrix at each step:
DO 70 IP = I, 4 DO 60 JP = I, 4 IF( ABS( T16( IP, JP ) ).GE.XMAX ) THEN XMAX = ABS( T16( IP, JP ) ) IPSV = IP JPSV = JP END IF 60 CONTINUE 70 CONTINUE IF( IPSV.NE.I ) THEN CALL DSWAP( 4, T16( IPSV, 1 ), 4, T16( I, 1 ), 4 ) END IF IF( JPSV.NE.I ) THEN CALL DSWAP( 4, T16( 1, JPSV ), 1, T16( 1, I ), 1 ) END IF
During the solution of this particular matrix, there's plenty of pivoting going on:
row ipivot i ipsv 1 4 col jpivot i jpsv 1 2 row ipivot i ipsv 2 3 row ipivot i ipsv 3 4 col jpivot i jpsv 3 4 And here's the final factored form:
3 1 -0.106+139 0.000E+00 0.150+131 0.599+129 3 2 0.000E+00 -0.106+139 -0.385+132 0.150+131 3 3 0.363E-06 -0.141E-07 -0.109+125 0.427+122 3 4 -0.141E-07 -0.566E-09 -0.392E-02 0.171+122
and if the computation continued with this factorization, the callers all the way back to dchkee would have been happy. But instead
IF( ABS( T16( 4, 4 ) ).LT.SMIN ) $ T16( 4, 4 ) = SMIN
since 0.171+122 < 2.3521337536917765E+122
But now, even though 0.171+122 might have been smothered in roundoff, it was the right roundoff for this problem, and SMIN was a really bad substitute choice. (For this problem, none of the other pivots met the roundoff criterion for substitution.)
Why did we need a substitute for the final pivot anyway? The matrix factorization completed. The back solution follows - the right hand side B in TMP is replaced by the solution X:
DO 120 I = 1, 4 K = 5 - I TEMP = ONE / T16( K, K ) TMP( K ) = TMP( K )TEMP DO 110 J = K + 1, 4 TMP( K ) = TMP( K ) - ( TEMPT16( K, J ) )*TMP( J ) 110 CONTINUE 120 CONTINUE
The effect of the final pivot is fairly limited:
TEMP = ONE / T16( 4, 4 ) TMP( 4 ) = TMP( 4 )*TEMP It could be a problem if T16(4,4) were zero or within striking distance of the underflow threshold. But that is not the case here (and is "protected" against by the SMLNUM mentioned earlier).
So I have some questions for the LAPACK experts:
Why is dlasy2's INFO ignored by dlaexc? Because it has its own correctness criterion?
Why does dlasy2 set INFO for all substituted pivots except the last?
Why does dlasy2 substitute for the last pivot anyway?
Why not reduce SMIN by a bit? This makes everybody happy.
SMIN = MAX( 0.02 * EPS*SMIN, SMLNUM )
Here's what dlasy2's header comments say about INFO:
*> On exit, INFO is set to *> 0: successful exit. *> 1: TL and TR have too close eigenvalues, so TL or *> TR is perturbed to get a nonsingular equation.
Popping up a level, we see dlaexc's purpose and explanation of its INFO:
*> DLAEXC swaps adjacent diagonal blocks T11 and T22 of order 1 or 2 in *> an upper quasi-triangular matrix T by an orthogonal similarity *> transformation.
*> INFO is INTEGER *> = 0: successful exit *> = 1: the transformed matrix T would be too far from Schur *> form; the blocks are not swapped and T and Q are *> unchanged.
and the test that implements it again:
- Test whether to reject swap.
IF( MAX( ABS( D( 3, 1 ) ), ABS( D( 3, 2 ) ), ABS( D( 4, 1 ) ), $ ABS( D( 4, 2 ) ) ).GT.THRESH )GO TO 50 ...
- Exit with INFO = 1 if swap was rejected.
50 CONTINUE INFO = 1 RETURN
As a temporary expedient pending receipt of an authoritative pronouncement, I'll comment out the test and substitution on the final pivot in dlasy2.f and slasy2.f (and why isn't this a problem for the single precision tests?)
One more question: total pivoting is supposed to be a good thing, though almost unncessary except for the extremely unlucky, (in whose company I would count those that have debugged this test case).
But during debugging when I limited the pivot search (thinking the pivoting was implemented wrong somehow) from
DO 70 IP = I, 4 <<<<<< DO 60 JP = I, 4 IF( ABS( T16( IP, JP ) ).GE.XMAX ) THEN ... END IF 60 CONTINUE 70 CONTINUE
to
DO 70 IP = I, I <<<<<< DO 60 JP = I, 4 IF( ABS( T16( IP, JP ) ).GE.XMAX ) THEN ... END IF 60 CONTINUE 70 CONTINUE
I got quite a different factored matrix with far less range of magnitudes and no need for substitute pivots:
3 1 -0.385+132 0.150+131 0.000E+00 0.482+122 3 2 -0.390E-01 0.118+130 0.482+122 0.188+121 3 3 0.000E+00 -0.895E+09 -0.341+132 0.167+131 3 4 0.275E+07 -0.349E+08 -0.488E-01 0.135+130
Compare to the result of total pivoting before the substitution:
3 1 -0.106+139 0.000E+00 0.150+131 0.599+129 3 2 0.000E+00 -0.106+139 -0.385+132 0.150+131 3 3 0.363E-06 -0.141E-07 -0.109+125 0.427+122 3 4 -0.141E-07 -0.566E-09 -0.392E-02 0.171+122
Some people are just lucky I guess.