OpenBLAS
OpenBLAS copied to clipboard
Multithreaded Programming using OpenBLAS
I want to create 3 threads which are using gemm function with OpenBLAS.
Thread 1 - core affinity with CPU core 1 (openblas_set_num_threads(1)) Thread 2 - core affinity with CPU core 2,3 (openblas_set_num_threads(2)) Thread 3 - core affinity with CPU core 4,5,6 (openblas_set_num_threads(3))
Do those threads run at once without editing any codes of your library?
https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded
I saw the link. If I use the USE_THREAD and USE_LOCKING environment variables, will it behave like the above situation?
If that is correct, how do I use them?
All 3 your threads will be serialized in 3-thread 'thread'. If your 3 threads are processes it will work as you expect.
All 3 your threads will be serialized in 3-thread 'thread'. If your 3 threads are processes it will work as you expect.
I want to run 3 threads in parallel, not in series. Is it possible to run in parallel? (in 1 process)
If you use_thread=0 but keep use_locking then you can schedule your single-threaded calls from your code. You cannot have both at once.
If you use_thread=0 but keep use_locking then you can schedule your single-threaded calls from your code. You cannot have both at once.
if I set USE_THREAD=1, USE_LOCKING=0, then is above situation possible without lock?
I want programming without race condition
There is common structure in memory.c that tracks allocations and you need locking for that. USE_THREAD means dllinit()/.init() will start thread per core at startup, for aforementioned reason locking is implied in this case.
There is common structure in memory.c that tracks allocations and you need locking for that. USE_THREAD means dllinit()/.init() will start thread per core at startup, for aforementioned reason locking is implied in this case.
Then, if I programmed with multi-processors not multi-threading,do those programs not have locking?
OpenBLAS locks own critical structures when single-threaded version is called concurrently. Obviously you have to barrier writes to not have them concurrenlty in same place, and then barrier reads to not read up half-written data, including activities done with OpenBLAS. Think of memcpy()-like behaviour with heavy computation in process.