OpenBLAS icon indicating copy to clipboard operation
OpenBLAS copied to clipboard

Undocumented Feature NUM_PARALLEL

Open lightsighter opened this issue 7 years ago • 4 comments

This is more a comment about an undocumented feature in case other users encounter a similar problem. I have an implementation of the OpenMP runtime that supports multiple copies of the OpenMP runtime in the same process. I was having threads bound to different OpenMP runtimes call into OpenBLAS simultaneously, but their executions were being serialized by OpenBLAS which was causing bad performance. The relevant bit of code is here:

https://github.com/xianyi/OpenBLAS/blob/develop/driver/others/blas_server_omp.c#L321-L336

Effectively there is a fixed number of buffers for managing parallel OpenMP calls available and the default is 1. So if multiple OpenMP runtimes call into OpenBLAS at the same time then only one of them will be able to make progress while all the rest of them spin-wait for the one available buffer. It seems like the right way to fix this is to set NUM_PARALLEL to the upper bound on the number of OpenMP runtimes that you can have in a process.

https://github.com/xianyi/OpenBLAS/blob/develop/Makefile.system#L197-L199

This will then set the max parallel number:

https://github.com/xianyi/OpenBLAS/blob/develop/Makefile.system#L1015

and then that will fill in extra buffers for OpenMP usage:

https://github.com/xianyi/OpenBLAS/blob/develop/driver/others/blas_server_omp.c#L57-L62

As far as I can tell this isn't documented anywhere. Maybe I just missed it. Please feel free to point me at the proper documentation if I did overlook it.

lightsighter avatar Aug 13 '18 20:08 lightsighter

This relatively recent addition (from PR #1536) is sort-of documented in Makefile.rule (where most of the build-time definitions for the gmake build system live).

martin-frbg avatar Aug 13 '18 20:08 martin-frbg

@lightsighter i wonder why you prefer it to keep option "undocumented" in your opinion and post WIKI page into a bug tracker.

brada4 avatar Aug 14 '18 05:08 brada4

I think it would still be good to document it somewhere in the readme just like is done with USE_OPENMP and DEBUG options.

@brada4 I don't want it to be undocumented; I would prefer it to be explicitly documented. I just didn't know where else to put this so that other users could search for this and find it.

lightsighter avatar Aug 15 '18 07:08 lightsighter

makefile.rule describes it, you can makw a faq page with more explanations?

brada4 avatar Aug 15 '18 14:08 brada4

Finally added this to the user documentation in the wiki

martin-frbg avatar Jan 09 '24 16:01 martin-frbg

Where in the wiki can I find the change?

lightsighter avatar Jan 09 '24 19:01 lightsighter

No need to start format wars. It is here, in source distribution. https://github.com/OpenMathLib/OpenBLAS/blob/7a6a24647df49bb0797dfe3d0f43f4dc1389aa41/Makefile.rule#L93

brada4 avatar Jan 09 '24 20:01 brada4

The "Build system overview" page for now, the whole thing needs to be restructured and I've basically only paraphrased your original posting for now. Currently looking into if/how to get this kludge really working with multiple concurrent instances given that there's some state information buried in the queue buffer structure as well that we don't want to end up in the wrong task.

martin-frbg avatar Jan 09 '24 20:01 martin-frbg

To be clear, that documentation doesn't actually cover the scenario described in this issue (somebody renamed this issue incorrectly). The issue being discussed here is NOT multiple threads calling into OpenBLAS at the same time. The issue raised here is the fact that OpenBLAS does not currently support using multiple copies of the OpenMP runtime in the same process. Imagine I have two threads, each of which are bound to a separate copy of the OpenMP runtime (and each OpenMP runtime has its own thread pool). If each of these threads call into OpenBLAS and want OpenBLAS to dispatch to their associated OpenMP runtime, then that does not work because of global variables that exist inside of OpenBLAS's implementation which makes the (invalid) assumption that there is never more than one OpenMP runtime in the same process.

lightsighter avatar Jan 09 '24 20:01 lightsighter

I'm not sure this issue got renamed, there is another open issue (by you as well, IIRC) that covers the somewhat unusual "multiple separate copies of OpenMP" topic. I have some vague hopes of addressing that scenario with what I'm currently working on as well, but it could all fall apart still.

martin-frbg avatar Jan 09 '24 20:01 martin-frbg

Ah, maybe I'm getting it confused then. I should have read the rest of the history. Sorry for the confusion.

I have some vague hopes of addressing that scenario with what I'm currently working on as well, but it could all fall apart still.

Thanks! Looking forward to it! Good luck!

lightsighter avatar Jan 09 '24 20:01 lightsighter

the other is #2164

martin-frbg avatar Jan 09 '24 23:01 martin-frbg