OpenBLAS icon indicating copy to clipboard operation
OpenBLAS copied to clipboard

Multithreaded Programming using OpenBLAS

Open solahn opened this issue 2 years ago • 7 comments

I want to create 3 threads which are using gemm function with OpenBLAS.

Thread 1 - core affinity with CPU core 1 (openblas_set_num_threads(1)) Thread 2 - core affinity with CPU core 2,3 (openblas_set_num_threads(2)) Thread 3 - core affinity with CPU core 4,5,6 (openblas_set_num_threads(3))

Do those threads run at once without editing any codes of your library?

https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded

I saw the link. If I use the USE_THREAD and USE_LOCKING environment variables, will it behave like the above situation?

If that is correct, how do I use them?

solahn avatar Sep 06 '23 14:09 solahn

All 3 your threads will be serialized in 3-thread 'thread'. If your 3 threads are processes it will work as you expect.

brada4 avatar Sep 07 '23 10:09 brada4

All 3 your threads will be serialized in 3-thread 'thread'. If your 3 threads are processes it will work as you expect.

I want to run 3 threads in parallel, not in series. Is it possible to run in parallel? (in 1 process)

solahn avatar Sep 07 '23 11:09 solahn

If you use_thread=0 but keep use_locking then you can schedule your single-threaded calls from your code. You cannot have both at once.

brada4 avatar Sep 07 '23 11:09 brada4

If you use_thread=0 but keep use_locking then you can schedule your single-threaded calls from your code. You cannot have both at once.

if I set USE_THREAD=1, USE_LOCKING=0, then is above situation possible without lock?

I want programming without race condition

HaileeKim avatar Sep 07 '23 11:09 HaileeKim

There is common structure in memory.c that tracks allocations and you need locking for that. USE_THREAD means dllinit()/.init() will start thread per core at startup, for aforementioned reason locking is implied in this case.

brada4 avatar Sep 07 '23 11:09 brada4

There is common structure in memory.c that tracks allocations and you need locking for that. USE_THREAD means dllinit()/.init() will start thread per core at startup, for aforementioned reason locking is implied in this case.

Then, if I programmed with multi-processors not multi-threading,do those programs not have locking?

HaileeKim avatar Sep 07 '23 17:09 HaileeKim

OpenBLAS locks own critical structures when single-threaded version is called concurrently. Obviously you have to barrier writes to not have them concurrenlty in same place, and then barrier reads to not read up half-written data, including activities done with OpenBLAS. Think of memcpy()-like behaviour with heavy computation in process.

brada4 avatar Sep 07 '23 17:09 brada4