blis icon indicating copy to clipboard operation
blis copied to clipboard

thread barriers need backoff

Open jeffhammond opened this issue 3 years ago • 2 comments

This code leads to serious problems when hardware threads are oversubscribed. Adding sched_yield() reduces the problem by one or two orders of magnitude by triggering the kernel to swap threads so progress happens more quickly.

// If the current thread is NOT the last thread to have arrived, then
// it spins on the sense variable until that sense variable changes at
// which time these threads will exit the barrier.
while ( __atomic_load_n( &comm->barrier_sense, __ATOMIC_ACQUIRE ) == orig_sense )
     ; // Empty loop body.

I tried no-op instructions but those do not help in the oversubscribed case, because they don't trigger a context switch. Those backoffs are appropriate when memory access contention is the issue.

This is related but complementary to https://github.com/flame/blis/issues/603. This is a new version of https://github.com/flame/blis/pull/82.

My proposed fix will allow a user to disable sched_yield() but I assert we need it enabled in the distribution builds of BLIS because quality-of-service is more important than the last bit of performance in the general case. Benchmarking use cases can disable it if it is expected to matter there.

References

  • https://github.com/flame/blis/issues/588#issuecomment-1014416925
  • https://github.com/flame/blis/issues/588#issuecomment-1021624825

jeffhammond avatar Jan 28 '22 12:01 jeffhammond

@jeffhammond sched_yield is too heavyweight and not portable enough to be used all the time. I will update #82 with a general framework for config-specific behavior and then we can start filling in the actual implementation.

devinamatthews avatar Jan 30 '22 15:01 devinamatthews

@jeffhammond suggestions for any specific architectures?

devinamatthews avatar Jan 31 '22 18:01 devinamatthews