lapack icon indicating copy to clipboard operation
lapack copied to clipboard

Fix out-of-bounds in xORBDB2 and xORBDB3

Open weslleyspereira opened this issue 3 years ago • 10 comments

Fix #549.

Description In #549, we verified xORBDB2 and xORBDB3 use subarrays of size 0 starting at X(N+1), where N is the size of X. These arrays are passed as arguments to other subroutines that never reference the arrays. Therefore, this is a false positive case of out-of-bounds. The code provokes no errors in practice, but some compilers (like gfortran) identify and return an out-of-bounds error. This PR tries to rewrite the code while keeping the check of bounds. [edited]

Checklist

  • [x] Fix sORBDB2
  • [x] Fix sORBDB3
  • [x] Fix dORBDB2 and dORBDB3

weslleyspereira avatar May 11 '21 18:05 weslleyspereira

The strategy was:

  1. Remove I=P and I=Q from the loops.

  2. Replace

SLARFGP( 1, ALPHA, OUTOFBOUNDS_V, LDV, TAU )

by

TAU = 1 - SIGN( 1, ALPHA )
ALPHA = ABS( ALPHA )
  1. Remove calls of SLARF where the output argument was invalid.

weslleyspereira avatar May 11 '21 18:05 weslleyspereira

Codecov Report

Merging #551 (5a5d229) into master (2dafa3d) will decrease coverage by 0.00%. The diff coverage is 67.67%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #551      +/-   ##
==========================================
- Coverage   82.37%   82.36%   -0.01%     
==========================================
  Files        1894     1894              
  Lines      190677   190729      +52     
==========================================
+ Hits       157063   157099      +36     
- Misses      33614    33630      +16     
Impacted Files Coverage Δ
SRC/cunbdb3.f 40.00% <15.78%> (-1.54%) :arrow_down:
SRC/zunbdb3.f 39.43% <15.78%> (-1.48%) :arrow_down:
SRC/dorbdb3.f 41.79% <20.00%> (-1.76%) :arrow_down:
SRC/sorbdb3.f 41.79% <20.00%> (-1.76%) :arrow_down:
SRC/cunbdb1.f 87.30% <100.00%> (+0.63%) :arrow_up:
SRC/cunbdb2.f 88.73% <100.00%> (+0.85%) :arrow_up:
SRC/dorbdb1.f 86.66% <100.00%> (+0.70%) :arrow_up:
SRC/dorbdb2.f 88.23% <100.00%> (+0.93%) :arrow_up:
SRC/sorbdb1.f 86.66% <100.00%> (+0.70%) :arrow_up:
SRC/sorbdb2.f 88.23% <100.00%> (+0.93%) :arrow_up:
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 2dafa3d...5a5d229. Read the comment docs.

codecov[bot] avatar May 11 '21 18:05 codecov[bot]

SLARFGP( 1, ALPHA, OUTOFBOUNDS_V, LDV, TAU )

Not related to this specific PR. But this reminds me that there is a bug in xLARFGP that I think we never managed to fix, and we initially inserted xLARFGP everywhere, and then removed it. It worries me that some routines in LAPACK are using xLARFGP. I do not know what to do about it since I do not know whether the code relies on the P or not. One idea would be to try with xLARFG and see if it works. Or we can leave it as is. It is possible that the usage of xLARFGP in this routine makes it such that xLARFGP is stable. All in all, I do not know.

langou avatar May 11 '21 20:05 langou

I do not know what to do about it since I do not know whether the code relies on the P or not.

The computed beta values are used as atan2 inputs. Without further review, one must assume that the code relies on the beta values being positive.

Description In #549, it was highlighted xORBDB2 and xORBDB3 had several out-of-bounds accesses. This PR tries to fix some of them.

There are no out-of-bounds accesses. Given an array X with valid indices 1, 2, ..., N, the code references the subarray of length zero starting at X( N + 1).

Checklist

* [x]  Fix sORBDB2

* [x]  Fix sORBDB3

* [ ]  Fix dORBDB2 and dORBDB3

Add to that:

  • [x] SORBDB1 (for when the column count Q is the smallest dimension)
  • [x] The array accesses in the other do-loop in SORBDB{1,2,3} when the smaller submatrix has more than zero rows
  • [x] DORBDB{1,2,3}
  • [x] CUNBDB{1,2,3}
  • [x] ZUNBDB{1,2,3}

christoph-conrads avatar May 11 '21 21:05 christoph-conrads

I see... If one wants to fix all of them, we would need to rewrite a number of algorithms with no real improvement in practice. I couldn't think of a good way to continue using -fcheck=bounds and avoid the false positives. Any ideas?

* I updated the description of this PR.

weslleyspereira avatar May 13 '21 12:05 weslleyspereira

I see... If one wants to fix all of them, we would need to rewrite a number of algorithms with no real improvement in practice. I couldn't think of a good way to continue using -fcheck=bounds and avoid the false positives. Any ideas?

  • I updated the description of this PR.

Valgrind and address sanitizers work well for me. They also avoid issue #339 when calling LAPACK code from C with the function prototypes in LAPACKE/include/lapack.h (the header omits the hidden arguments of type size_t that contain the length of the strings passed to the callee; consequently C callers do not initialize these values).

Valgrind can be used with any executable: valgrind --valgrind-arg1 --valgrind-arg2 -- my-program --my-arg. Address sanitizers can be used by passing -fsanitize=address on the compiler and linker command line. It is also possible to use address sanitizers without re-building the code by pre-loading them (e.g., env LD_PRELOAD=/path/to/libasan.so my-executable --my-args) but I am not sure which problems can be detected this way. On some Linux distributions, the address sanitizer libraries are not part of the GCC package installed by the system's package manager (e.g., on CentOS); use of the sanitizer will then cause linker errors.

christoph-conrads avatar May 14 '21 10:05 christoph-conrads

I see... If one wants to fix all of them, we would need to rewrite a number of algorithms with no real improvement in practice. I couldn't think of a good way to continue using -fcheck=bounds and avoid the false positives. Any ideas?

Just found in sgeqr2.f: Use X( MIN(I+1, N) ).

https://github.com/Reference-LAPACK/lapack/blob/2dafa3d2756a7825c23a8c8456781561e36668ae/SRC/sgeqr2.f#L177-L182

christoph-conrads avatar May 14 '21 10:05 christoph-conrads

Thanks, @christoph-conrads! That was easy to implement. I followed your suggestions and created a new commit. I am now applying the fixes to xORBDB1, and then I will look at the complex cases.

weslleyspereira avatar May 14 '21 13:05 weslleyspereira

Done! I hope I didn't miss anything.

weslleyspereira avatar May 14 '21 13:05 weslleyspereira

Thanks, @christoph-conrads! That was easy to implement. I followed your suggestions and created a new commit. I am now applying the fixes to xORBDB1, and then I will look at the complex cases.

This was not a suggestion, I just responded to your request for ideas, and I think this is a bad idea.

These changes trade readability for -fcheck=bounds shutting up about alleged out-of-bounds accesses in the special case where the matrix at hand is not a submatrix of a bigger matrix. Valgrind and the address sanitizer can detect OOB access in this special case, too. In any other case, -fcheck=bounds will not help. Moreover, if there is any numerical mistake in those 250+ lines touched by this commit, none of these tools will help and given variable names like I, I1, I2, I3, and I4, the humans cannot help either.

PS: I gave up on -fcheck=bounds months ago and rely only on the address sanitizer as well as Valgrind.

christoph-conrads avatar May 14 '21 19:05 christoph-conrads

I will drop this PR. I don't think this is a good idea anymore. I agree with the sentence:

These changes trade readability for -fcheck=bounds shutting up about alleged out-of-bounds accesses in the special case where the matrix at hand is not a submatrix of a bigger matrix.

I think it would be good to have a routine in the CI to run the tests under Valgrind. Thanks @christoph-conrads!

weslleyspereira avatar May 30 '23 16:05 weslleyspereira