Andrew

Results 724 comments of Andrew

Are you certain you use same openblas library for each test?

You can try omp_get_num_threads() , I think openblas_get_. just gets number from there.

You can redefine anything you find in /common_macro.h

Can you get 'perf record "command" ; perf report" from both cases? ...zoom to see which functions in libopenblas.so.0 are being touched (probably add iterations so that more is seen)...

There is no need for perf.data, it is nearly impossible to decode on even slightly different system, does not look very specific to your setup, just that 10/20 cores dramaticize...

Looking at it * you must start checking thread magic so we manipulate our thread (current logic is exactly reverse) * then you zap things only in places you need...

Distinguish "ours" from "main" and "others"

Does it affect default build that does not play with affinity and allows 10 years fresher operating system scheduler to place processes at processors? Where you say NUMA it is...

There is newer compiler in softwarecollections.org named devtoolset-\?-gcc

It is selectable , you dont have to change system compiler a bit dated instruction here: https://github.com/xianyi/OpenBLAS/wiki/faq#binutils