lapack icon indicating copy to clipboard operation
lapack copied to clipboard

Correctness and efficiency of LAPACKE_xLACPY when matrix_layout is LAPACK_ROW_MAJOR

Open digikar99 opened this issue 3 years ago • 2 comments

Description

Going by the discussion here (which I'm summarizing below), it seems that lapacke_slacpy and friends are (i) copying more than twice as much when asked to copy only the lower or upper triangular parts (ii) copying over the uninitialized malloc-ed region back to b.

https://github.com/Reference-LAPACK/lapack/blob/28f7e8309608b92aaec2e2556d4b25d758ccada9/LAPACKE/src/lapacke_slacpy_work.c#L70-L76

This results in the following code working as expected while using LAPACK_COL_MAJOR but not working as expected while using LAPACK_ROW_MAJOR.

Code, compile using gcc test-lacpy.c -llapacke -o test-lacpy:

#include <lapacke.h>
#include <stdio.h>

#define size 9

int main(){
  float a[size] = {1,2,3,4,5,6,7,8,9};
  float b[size];

  for(int i=0; i<size; i++) b[i] = 100;

  // Works as expected with LAPACK_COL_MAJOR but not with LAPACK_ROW_MAJOR
  LAPACKE_slacpy(LAPACK_COL_MAJOR, 'U', 3, 3, a, 3, b, 3);

  for(int i=0; i<3; i++){
    for(int j=0; j<3; j++){
      printf("%f ", b[i*3+j]);
    }
    printf("\n");
  }

  return 0;
}

When compiled with LAPACK_COL_MAJOR, this produces the output:

1.000000 100.000000 100.000000 
4.000000 5.000000 100.000000 
7.000000 8.000000 9.000000 

But when compiled with LAPACK_ROW_MAJOR, this produces the unexpected output (expectation is 100 instead of 0):

1.000000 2.000000 3.000000 
0.000000 5.000000 6.000000 
0.000000 0.000000 9.000000

Might a native C solution be preferable for this particular function/family?

Checklist

  • [x] I've included a minimal example to reproduce the issue
  • [ ] I'd be willing to make a PR to solve this issue (EDIT: I'd be willing to make a PR once someone confirms the plan of action.)

PS: Credits to stassats for pointing out the bug!

digikar99 avatar Oct 13 '22 12:10 digikar99

That's an interesting corner case. Strictly speaking, the code is correct image since B = A in all upper triangular entries, but the "copy over" is likely an expected side effect.

I would support treating B as an in,out variable. Something like:

        /* Transpose input matrices */
        LAPACKE_sge_trans( matrix_layout, m, n, a, lda, a_t, lda_t );
+       if (LAPACKE_lsame(uplo, 'L') || LAPACKE_lsame(uplo, 'U')) {
+           LAPACKE_sge_trans( matrix_layout, m, n, b, ldb, b_t, ldb_t );
+       }
        /* Call LAPACK function and adjust info */
        LAPACK_slacpy( &uplo, &m, &n, a_t, &lda_t, b_t, &ldb_t );

angsch avatar Oct 31 '22 20:10 angsch

In the cases when uplo is L or U, copying should involve copying of half the array. However, going down the path of transposing the inputs and then the outputs, would involve memory allocation, as well as transposing the 2 input arrays, and 1 output array, effectively needing 3.5 array copying to copy what should essentially have required copying just half the array.

This is why I wonder if a native C solution might be preferable for this particular function for the ROW_MAJOR layout: write the function using if-else statements and for-loops and never call the malloc, LAPACKE_sge_trans, LAPACKE_slacpy functions at all.

It also seems possible to call the slacpy function directly with appropriately changed arguments, without the intermediate transposes.

digikar99 avatar Nov 10 '22 01:11 digikar99