OpenBLAS
OpenBLAS copied to clipboard
Crash with Open Blas using cblas_dgemm with square matrix of size 100 on widows 10 Visual Studio 2017
Hi all, I have small test case where I call "cblas_dgemm()" using OpenBLAS and when I use a matrix of size N x N with N >= 70, I have a crash! I can compile and run the code for N <= 60 but it does not work anymore with N > 70. Here is my environment;
- Windows 10 Professional 64 bits
- Visual Studio 2017 (Test with Release x64)
- OpenBLAS-0.2.20
- Processor: Intel Core i7-5930K Haswell E-EP with 16GB of RAM
The test code is attached in test_cblas_dgemm.zip. I use the one in the tutorial of Intel https://software.intel.com/en-us/mkl-tutorial-c-multiplying-matrices-using-dgemm but using OpenBLAS instead of Intel(R) MKL.
Where did I make wrong or any ideas of the issue ? Thanks !
Not reproducible on Linux; did you build OpenBLAS yourself or use a precompiled library ?
I build it myself using the instructions in https://github.com/xianyi/OpenBLAS/wiki/Installation-Guide
- Installed MSYS2
- Run make
- Installed as
make install PREFIX=/c/OpenBLAS/
I did not see any issue during the installation and when I run the makefile in the repository
/utest
it reports 0 failed tests
Any chance to get the VS2017 debugger to tell where it crashes ?
Unfortunately no!
- With N <= 50, when I try to "STEP INTO" at the line where I call cblas_dgemm() in debug mode, it just goes to the next line (instruction line 131)
- But with N = 100, if I try the same thing, I got the following message

And when I run it, here is the screenshot

Can you rig compiler options to get symbols available to debugger? At least disasembly around instruction pointer from each thread at the moment of crash...
Downloaded precompiled version has partial symbols from mingw pointing to faulty function (should be at least dgemm_ F77 called from cblas_dgemm down the road) then you can try to guess failing code line from disassembly. It is worth trying them https://sourceforge.net/projects/openblas/files/v0.2.19/
x64dbg may be more brainy, especially when you try to mix gcc and mscv debug symbols (which likely is the case here)
Does it crash when compiled with 'g++ -lopenblas sample.cpp -o sample.exe' from same mingw you have? i.e it could be int32/int64 mismatch in cblas call.
OK thanks, I'll keep you posted.
Ok so I tested with 'g++' from mingw and it works fine, there is no crash. I tried also with the version you mentioned and compiled with debug mode with Visual. I don't really understand what's going on and I'm not an expert on disassembly view but here is the call stack at the time of the crash:

The disassembly for the "WaitOnCriticalSection()" (last call in the call stack) is

Does it help?
Thats just what is left after threads. Try 2 library versions with INTERFACE64=0 and =1 One needs int32 for integer other int64 and if they get mixed up all pointer arguments after first type mismatch are garbage that points to unallocated memory.
OK so I tested with two library versions
make INTERFACE64=0
and
make INTERFACE64=1
but I still have a crash. For each of them, I tried with
- VS 2017 (Release x64) with int and long long integer
- VS 2015 (Release x64) with int and long long integer
Can you run any crashing .exe from within x64dbg so that backtraces are captured? So far VS debugger shows ntdll.dll accesing memory at 0x24, which is wrong. What is needed - call chain leading to this invalid access.
OK thanks I'll keep you posted.
It looks like mixed up calling convention, OpenBLAS DLL would expect cdecl /Gd, i.e default calling convention. https://msdn.microsoft.com/en-us/library/46t77ak2.aspx There is no problem with either part of code, just the compiler settings. (int(32) is default binary build)
Sorry but I'm not sure to understand, I have checked and the default value in VS is already cdecl /Gd

So what should I do exactly to the compiler settings? Thanks !
Yours is the correct setting (all settings at their defaults), and you tried all possible (mis-) configurations. I think most software redistributing OpenBLAS on Windows use exactly one or other sort of mingw gcc in their toolchain, While waiting for somebody with Windows access you can try MinGW g++, clang-llvm for windows, which, while not very comfortable will cover great part of your matrix multiplication needs. Also if accidentally you discover "correct" set of settings to link to cdecl-only DLL, you are more than welcome to correct FAQ pages.
Ok now I understand, and Yes I will correct FAQ pages if that happens. Thanks a lot for your help. Do you want me to close this ticket or should I leave it for reference purpose?
Keep it open. Deficiency is here still.
Try this: msvc project properties : Linker:Advanced: Randomize base adress--it is set to 'YES'. set it "NO" ---No (/DYNAMICBASE:NO)---
Hi shoshia , I tried it but the crash is still there.
You are right . I tested (/DYNAMICBASE:NO) has no effect when cblas functions are called; But your code compiles and works for me vs2013. can you try this build(it is my personal build) http://www.filehosting.org/file/details/720397/openblas4.tgz and here are required dll's http://www.filehosting.org/file/details/720398/mingw_dlls.tgz
Yes it compiles also for me with VS2017 so the issue is not the compilation. And it also works fine as long as the size of the matrix <= 50 x 50. Sorry but I can't get your files, I got the following message when I click on the links

Unfortunately I don't have VS2013
Could you try with this modified copy of driver/level3/level3_thread.c (rename from .txt suffix that the issue tracker wants) ? valgrind/helgrind finds some races with the original, not sure if they could be related to the VS2017 problem in any way. level3_thread.txt
OK I have tested with your copy but I have an error during the build of openBLAS, I got the following error

Is there other things that I need to change before building OpenBLAS ?
Sorry, that is a typo I made in line 105, should be "level3_lock" not "level_3lock" obviously (and the #ifdef made sure the error did not occur on my platform...)
OK thanks now I can build OpenBLAS but I got an error message with VS2017 when I build my project:

However, I don't get any error message while building OpenBLAS
Ah, sorry. I "borrowed" the locking mechanism from blas_server.c without noticing that the Windows platform has its own version as blas_server_win32.c there (and using critical sections instead of mutexes). So with luck I have "only" dragged a dependency on the (mingw?) libpthread into your code, or I may have caused actual breakage. :-(
This version has the pthread_mutex_(un)lock replaced with Enter/LeaveCriticalSection, but I have not tested if it even compiles on Windows. Probably better to wait until I have set up a Windows system for testing, or until someone chimes in with a better idea. level3_thread.txt
Thanks martin-frbg, I have reverted to the previous version so that means there is no breakage :) Ok I can wait for the solution on Windows :)