About locking when use OpenBLAS with OpenMP

Open KashpurovichYuri opened this issue 1 year ago • 0 comments

Implementing an algorithm I ran into a problem how to better use OpenBLAS. So I have several matrix multiplications in omp parallel section. Resulting matrices should be summed up. So it is just $C = C + A * B$ (e.g., usual dgemm routine with shared $C$ and private $A$ and $B$ in omp parallel section), but might you clarify does OpenBLAS optimally deal with synchronization here when library was built with USE_OPENMP=1 USE_LOCKING=1? I mean something like summing in $C = C + A * B$ expression should be done after block of $A * B$ is calculated (so obviously elements of $C$ shouldn't be updated very often). Could you please tell if I have a correct idea about the implementation in OpenBLAS or do I need to take into account described remarks on my own? And if it would be better to study your code instead of asking such questions directly, just say so!)

Feb 25 '25 14:02 KashpurovichYuri