OpenBLAS icon indicating copy to clipboard operation
OpenBLAS copied to clipboard

WIN64: OpenBLAS ctest with errors when linked against libopenblas.dll but works if build statically.

Open carlkl opened this issue 3 years ago • 5 comments

This is with Windows 64 bit, Msys2 gcc-12. (Both: UCRT and MSVCRT)

I tested for curiosity what happens if the test programs located in ctest are dynamically linked against libopenblas.dll. To my surprise the blat2 and blat3 test mostly failed. If the test programs are statically linked with the static import library all is fine. (BTW: exclusively statically linked test programs are being build with make.)

This results is surprising as the libopenblas.dll itself is basically built from the static import library with the addition of dllinit.obj that is responsible to call gotoblas_init or gotoblas_dynamic_init.

For now I didn't perform further tests with the test or utest programs.

Any ideas what is going on?

xzcblat3 < zin3

C:\dev\home\tmp\OpenBLAS_gh\ctest>xzcblat3 < zin3
TESTS OF THE COMPLEX*16        LEVEL 3 BLAS

 THE FOLLOWING PARAMETER VALUES WILL BE USED:
   FOR N                   0     1     2     3     5     9    35
   FOR ALPHA          ( 0.0, 0.0)  ( 1.0, 0.0)  ( 0.7,-0.9)
   FOR BETA           ( 0.0, 0.0)  ( 1.0, 0.0)  ( 1.3,-1.1)

 ROUTINES PASS COMPUTATIONAL TESTS IF TEST RATIO IS LESS THAN   16.00

 COLUMN-MAJOR AND ROW-MAJOR DATA LAYOUTS ARE TESTED

 RELATIVE MACHINE PRECISION IS TAKEN TO BE  2.2E-16

 ** On entry to ZGEMM  parameter number  0 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 1 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  0 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 1 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  0 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 1 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  0 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 1 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  1 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 2 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  1 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 2 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  2 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 3 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  2 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 3 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  3 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 4 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  3 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 4 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  3 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 4 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  3 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 4 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  4 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 5 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  4 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 5 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  4 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 5 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  4 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 5 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  5 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 6 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  5 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 6 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  5 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 6 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  5 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 6 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  8 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 9 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  8 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 9 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  8 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 9 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  8 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 9 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number 10 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 11 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number 10 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 11 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number 10 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 11 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number 10 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 11 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number 13 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 14 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number 13 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 14 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number 13 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 14 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number 13 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 14 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  4 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 4 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  4 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 4 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  4 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 4 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  4 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 4 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  3 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 5 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  3 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 5 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  3 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 5 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  3 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 5 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  5 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 6 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  5 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 6 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  5 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 6 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  5 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 6 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number 10 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 9 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number 10 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 9 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number 10 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 9 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number 10 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 9 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  8 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 11 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  8 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 11 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  8 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 11 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number  8 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 11 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number 13 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 14 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number 13 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 14 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number 13 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 14 NOT DETECTED BY cblas_zgemm *****
 ** On entry to ZGEMM  parameter number 13 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 14 NOT DETECTED BY cblas_zgemm *****
***** cblas_zgemm FAILED THE TESTS OF ERROR-EXITS *******

 cblas_zgemm  PASSED THE COLUMN-MAJOR COMPUTATIONAL TESTS ( 27783 CALLS)
 cblas_zgemm  PASSED THE ROW-MAJOR    COMPUTATIONAL TESTS ( 27783 CALLS)

 ** On entry to ZHEMM  parameter number  0 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 1 NOT DETECTED BY cblas_zhemm *****
 ** On entry to ZHEMM  parameter number  1 had an illegal value
***** ILLEGAL VALUE OF PARAMETER NUMBER 2 NOT DETECTED BY cblas_zhemm *****
 ** On entry to ZHEMM  parameter number  2 had an illegal value
......

carlkl avatar Sep 30 '22 19:09 carlkl

Weird, and does not even look like missed initialization. ABI mismatch between the C and Fortran code parts perhaps - could the dynamic version be picking up a different libgfortran?

martin-frbg avatar Sep 30 '22 20:09 martin-frbg

Unfortunately not. I tested this on two computers with a clean Msys2 install. And traced the linking process with -Wl,-t.

All this reminds me a little bit on the Windows fmod error two years ago. (staled FPU state)

carlkl avatar Sep 30 '22 21:09 carlkl

I would like to add that I have exactly the same experience with an own compiled OpenBLAS install.

carlkl avatar Sep 30 '22 21:09 carlkl

Well of course the error message is similar, as it simply tells you that an input argument is wrong - but back then it was something (mis)computed in an earlier step, here it is one of the dimensions that is wrong (or passed wrongly). If it isn't a wrong-generation libgfortran getting loaded, having only parts of the code built with INTERFACE64 would probably cause similar effects (but I have no idea if/how that could have happened)

martin-frbg avatar Oct 01 '22 08:10 martin-frbg

Oh wait, there is something else going wrong. The illegal parameter errors are expected at this point in the tests, it is only that the test framework does not recognize that the invalid values were caught and reported correctly. Are you building with the gfortran compiler at all, or are you using the f2c-converted fallbacks that were added in 0.3.21 ? I suspect it might be the latter with some f2c-induced quirk, though that would not immediately explain why everything works in the static build.

martin-frbg avatar Oct 01 '22 14:10 martin-frbg

This is a limitation in windows. The tests rely on a test specific xerbla to be used to catch these illegal parameters given in tests. This works fine on Unix where the routines in the shared library use the xerbla in the executable and not the xerbla in the shared library because the symbol in the shared library is a weak symbol. However on windows this is not the case and xerbla in the DLL is used leading to the error messages.

isuruf avatar Oct 04 '22 19:10 isuruf

@isuruf, xzcblat3 < zin3 is using cblas_xerbla. From the analysis of the linker map it appears that the dynamic binary does not use the openblas version.

I have now a debug build of everything. Unfortunately gdb doesn't seem to work with these binaries (usually no problem) so I have to dig into this further.

carlkl avatar Oct 05 '22 10:10 carlkl

Oops, thanks @isuruf. Guess I should not try to do half-assed support when my f(r)oggy brain is completely elsewhere. (And it may make sense to add a short comment to the Makefile ?)

@carlkl not sure what the linker map tells you, but the On entry to... message is generated by the xerbla in the dll (which only prints error messages) rather than the local one in the testcase (which sets a global variable depending on whether the BLAS function returned the expected error code)

martin-frbg avatar Oct 05 '22 12:10 martin-frbg

xzcblat3.exe i.e. is not linked against cblas_xerbla from libopenblas.dll. However, the test programs are linked against the cblas lapack routines in libopenblas.dll amd these routines are using cblas_xerbla from libopenblas.dll. There is no easy way to prevent this. Lessons learned: On the windows platform these tests needs to be built staticcally against libopenblas. Thanks for this insight @isuruf

carlkl avatar Oct 06 '22 08:10 carlkl

Since all this is considered as expected on Windows, I will close this issue.

carlkl avatar Oct 06 '22 18:10 carlkl