OpenBLAS icon indicating copy to clipboard operation
OpenBLAS copied to clipboard

Crash with Open Blas using cblas_dgemm with square matrix of size 100 on widows 10 Visual Studio 2017

Open RyoSahiba opened this issue 7 years ago • 28 comments

Hi all, I have small test case where I call "cblas_dgemm()" using OpenBLAS and when I use a matrix of size N x N with N >= 70, I have a crash! I can compile and run the code for N <= 60 but it does not work anymore with N > 70. Here is my environment;

  • Windows 10 Professional 64 bits
  • Visual Studio 2017 (Test with Release x64)
  • OpenBLAS-0.2.20
  • Processor: Intel Core i7-5930K Haswell E-EP with 16GB of RAM

The test code is attached in test_cblas_dgemm.zip. I use the one in the tutorial of Intel https://software.intel.com/en-us/mkl-tutorial-c-multiplying-matrices-using-dgemm but using OpenBLAS instead of Intel(R) MKL.

test_cblas_dgemm.zip

Where did I make wrong or any ideas of the issue ? Thanks !

RyoSahiba avatar Dec 24 '17 15:12 RyoSahiba

Not reproducible on Linux; did you build OpenBLAS yourself or use a precompiled library ?

martin-frbg avatar Dec 24 '17 16:12 martin-frbg

I build it myself using the instructions in https://github.com/xianyi/OpenBLAS/wiki/Installation-Guide

  • Installed MSYS2
  • Run make
  • Installed as

make install PREFIX=/c/OpenBLAS/

RyoSahiba avatar Dec 24 '17 17:12 RyoSahiba

I did not see any issue during the installation and when I run the makefile in the repository

/utest

it reports 0 failed tests

RyoSahiba avatar Dec 24 '17 18:12 RyoSahiba

Any chance to get the VS2017 debugger to tell where it crashes ?

martin-frbg avatar Dec 24 '17 19:12 martin-frbg

Unfortunately no!

  • With N <= 50, when I try to "STEP INTO" at the line where I call cblas_dgemm() in debug mode, it just goes to the next line (instruction line 131)
  • But with N = 100, if I try the same thing, I got the following message

try_debug_crashopenblas

And when I run it, here is the screenshot

crashopenblas

RyoSahiba avatar Dec 24 '17 20:12 RyoSahiba

Can you rig compiler options to get symbols available to debugger? At least disasembly around instruction pointer from each thread at the moment of crash...

Downloaded precompiled version has partial symbols from mingw pointing to faulty function (should be at least dgemm_ F77 called from cblas_dgemm down the road) then you can try to guess failing code line from disassembly. It is worth trying them https://sourceforge.net/projects/openblas/files/v0.2.19/

x64dbg may be more brainy, especially when you try to mix gcc and mscv debug symbols (which likely is the case here)

Does it crash when compiled with 'g++ -lopenblas sample.cpp -o sample.exe' from same mingw you have? i.e it could be int32/int64 mismatch in cblas call.

brada4 avatar Dec 24 '17 20:12 brada4

OK thanks, I'll keep you posted.

RyoSahiba avatar Dec 24 '17 21:12 RyoSahiba

Ok so I tested with 'g++' from mingw and it works fine, there is no crash. I tried also with the version you mentioned and compiled with debug mode with Visual. I don't really understand what's going on and I'm not an expert on disassembly view but here is the call stack at the time of the crash:

call_stack_crash

The disassembly for the "WaitOnCriticalSection()" (last call in the call stack) is

waitoncritical

Does it help?

RyoSahiba avatar Dec 26 '17 20:12 RyoSahiba

Thats just what is left after threads. Try 2 library versions with INTERFACE64=0 and =1 One needs int32 for integer other int64 and if they get mixed up all pointer arguments after first type mismatch are garbage that points to unallocated memory.

brada4 avatar Dec 26 '17 20:12 brada4

OK so I tested with two library versions

make INTERFACE64=0

and

make INTERFACE64=1

but I still have a crash. For each of them, I tried with

  • VS 2017 (Release x64) with int and long long integer
  • VS 2015 (Release x64) with int and long long integer

RyoSahiba avatar Dec 28 '17 17:12 RyoSahiba

Can you run any crashing .exe from within x64dbg so that backtraces are captured? So far VS debugger shows ntdll.dll accesing memory at 0x24, which is wrong. What is needed - call chain leading to this invalid access.

brada4 avatar Dec 29 '17 05:12 brada4

OK thanks I'll keep you posted.

RyoSahiba avatar Dec 29 '17 11:12 RyoSahiba

It looks like mixed up calling convention, OpenBLAS DLL would expect cdecl /Gd, i.e default calling convention. https://msdn.microsoft.com/en-us/library/46t77ak2.aspx There is no problem with either part of code, just the compiler settings. (int(32) is default binary build)

brada4 avatar Dec 29 '17 13:12 brada4

Sorry but I'm not sure to understand, I have checked and the default value in VS is already cdecl /Gd

cdecl_default

So what should I do exactly to the compiler settings? Thanks !

RyoSahiba avatar Dec 29 '17 20:12 RyoSahiba

Yours is the correct setting (all settings at their defaults), and you tried all possible (mis-) configurations. I think most software redistributing OpenBLAS on Windows use exactly one or other sort of mingw gcc in their toolchain, While waiting for somebody with Windows access you can try MinGW g++, clang-llvm for windows, which, while not very comfortable will cover great part of your matrix multiplication needs. Also if accidentally you discover "correct" set of settings to link to cdecl-only DLL, you are more than welcome to correct FAQ pages.

brada4 avatar Dec 29 '17 22:12 brada4

Ok now I understand, and Yes I will correct FAQ pages if that happens. Thanks a lot for your help. Do you want me to close this ticket or should I leave it for reference purpose?

RyoSahiba avatar Dec 30 '17 16:12 RyoSahiba

Keep it open. Deficiency is here still.

brada4 avatar Dec 30 '17 16:12 brada4

Try this: msvc project properties : Linker:Advanced: Randomize base adress--it is set to 'YES'. set it "NO" ---No (/DYNAMICBASE:NO)---

shoshia avatar Jan 19 '18 18:01 shoshia

Hi shoshia , I tried it but the crash is still there.

RyoSahiba avatar Jan 20 '18 13:01 RyoSahiba

You are right . I tested (/DYNAMICBASE:NO) has no effect when cblas functions are called; But your code compiles and works for me vs2013. can you try this build(it is my personal build) http://www.filehosting.org/file/details/720397/openblas4.tgz and here are required dll's http://www.filehosting.org/file/details/720398/mingw_dlls.tgz

shoshia avatar Jan 21 '18 17:01 shoshia

Yes it compiles also for me with VS2017 so the issue is not the compilation. And it also works fine as long as the size of the matrix <= 50 x 50. Sorry but I can't get your files, I got the following message when I click on the links

oops seite nicht gefunden filehosting org - google chrome

Unfortunately I don't have VS2013

RyoSahiba avatar Jan 22 '18 13:01 RyoSahiba

Could you try with this modified copy of driver/level3/level3_thread.c (rename from .txt suffix that the issue tracker wants) ? valgrind/helgrind finds some races with the original, not sure if they could be related to the VS2017 problem in any way. level3_thread.txt

martin-frbg avatar Jan 22 '18 16:01 martin-frbg

OK I have tested with your copy but I have an error during the build of openBLAS, I got the following error

glibrray_quantlinear_algebraopenblasopenblas 0 2 20_modif_level3

Is there other things that I need to change before building OpenBLAS ?

RyoSahiba avatar Jan 22 '18 18:01 RyoSahiba

Sorry, that is a typo I made in line 105, should be "level3_lock" not "level_3lock" obviously (and the #ifdef made sure the error did not occur on my platform...)

martin-frbg avatar Jan 22 '18 18:01 martin-frbg

OK thanks now I can build OpenBLAS but I got an error message with VS2017 when I build my project:

error_modif_level3

However, I don't get any error message while building OpenBLAS

RyoSahiba avatar Jan 23 '18 20:01 RyoSahiba

Ah, sorry. I "borrowed" the locking mechanism from blas_server.c without noticing that the Windows platform has its own version as blas_server_win32.c there (and using critical sections instead of mutexes). So with luck I have "only" dragged a dependency on the (mingw?) libpthread into your code, or I may have caused actual breakage. :-(

martin-frbg avatar Jan 23 '18 21:01 martin-frbg

This version has the pthread_mutex_(un)lock replaced with Enter/LeaveCriticalSection, but I have not tested if it even compiles on Windows. Probably better to wait until I have set up a Windows system for testing, or until someone chimes in with a better idea. level3_thread.txt

martin-frbg avatar Jan 23 '18 22:01 martin-frbg

Thanks martin-frbg, I have reverted to the previous version so that means there is no breakage :) Ok I can wait for the solution on Windows :)

RyoSahiba avatar Jan 24 '18 10:01 RyoSahiba