Andrew
Andrew
Just that you propose similar, working idea, that is close to library in the OP, which in turn wraps IPP parallel section inside OMP parallel ones, I just looked into...
What is missing in travis.yml ?
There is fixed compiled-in struct permitting 2 blas_memory_alloc()/..free() instances at any time for each of openblas (NUM_)threads for all openblas threads in fact occuring in current process. It was recently...
Thats old codebase, probably most of concern goes away once USE_TLS gets stable. 50 and MAX_NUMBER just wins some time to let USE_TLS stage before general use.
If you use OMP then OMP OpenBLAS under the hood (like one available with Debian/Ubuntu) , it falls back to single thread when it detects being in parallel section. Other...
Thats upstream (Netlib LAPACK) stuff that does not run parallel. cblas symbols are provided directly from OpenBLAS without extra wrapper.
Seeing OpenMP in code - you need to build OpenBLAS with OpenMP support, that "support" is quite rudimentary and turns into single-threaded OpenBLAS computation inside your parallel sections. Complementing to...
Namely following FAQ entries apply: https://github.com/xianyi/OpenBLAS/wiki/faq#debianlts https://github.com/xianyi/OpenBLAS/wiki/faq#wronglibrary
Probably thread safety improved a lot since that warning was introduced and nothing hangs recently. Thread number detected is not as important as total run time reduction
You can count CPU usage with "time" command - like if user+system > total then you use threads.