OpenBLAS icon indicating copy to clipboard operation
OpenBLAS copied to clipboard

For multithreaded GEMM in OpenBLAS, do I need to call API in the program to ensure thread affinity?

Open AnonymousYWL opened this issue 3 years ago • 4 comments

AnonymousYWL avatar Feb 15 '22 13:02 AnonymousYWL

openblas_set_num_threads()

Has this API ensured thread affinity?

AnonymousYWL avatar Feb 15 '22 13:02 AnonymousYWL

Nope, it is not assured. Once the affinity code sets main thread's mask to one CPU that has no way out of that core including child processes. That is also described from other point of view in Makefile.rule. Please check this https://www.postgresql.org/message-id/[email protected] for some options to tweak to get (10x)closer to affined threading without abovementioned dangers. By default threads with heavy CPU usage is supposed to stay on different CPUs. With numbers identical they sort of do so. Please measure your ways around the tweaks. Suspecting you actually suspect some fishy performance do following measurements of wall time spent:

  • one threaded call
  • A call limited to one NUMA node (if you have server CPU or multiple)
  • all-threaded call
  • all-threaded call with tweaks applied

They shall get faster in that order.

If not - you can try drilling down your code's heaviest parts from openblas with perf record ; perf report that is profiler that does not need compiled-in instrumentation. Then rinse and repeat with "reduced sample" - if yo find regression it is very easy to understand here.

brada4 avatar Feb 15 '22 14:02 brada4

@AnonymousYWL any update on your side?

brada4 avatar Feb 18 '22 15:02 brada4

No, thanks for your reply.

AnonymousYWL avatar Feb 19 '22 03:02 AnonymousYWL