blis
blis copied to clipboard
Implement use of pthread affinity functions
Read an environment variable, say, BLIS_CPU_AFFINITY, and use its contents to call pthread_setaffinity_np() to set the threads' affinity masks.
Ideally, the same environment variable would control OpenMP thread affinity, in the event that BLIS is configured with OpenMP instead of pthreads, but there may be implementation realities that make this infeasible. Reader: please consider this issue a request for comment.
sched_setaffinity is probably the way to go on Linux. Although, you really also need HW topology information, which is probably best gotten from hwloc.
And FYI you can't set affinity at all on OSX so Linux-only is fine.
I'm looking for the similar features too and considering implementing it with hwloc.
However, I found the performance drop severely when multithread is enabled (on ARMv8). I used perf to analysis the issue and found most of time it consumed happens at the while loop in bli_thrcomm_barrier(). Need to fix this first.
@baozich What happens if you use the OpenMP build of BLIS and set affinity via OMP_PLACES and OMP_PROC_BIND?