blis icon indicating copy to clipboard operation
blis copied to clipboard

Conditionally harden barriers.

Open devinamatthews opened this issue 6 months ago • 0 comments

Details:

  • Adds a configuration flag --harden-barriers (disabled by default).
  • When enabled, threads record a) the currently-detected sense variable from the barrier object (in this mode, the sense variable is also incremented rather than XOR'ed between 1 and 0 to prevent ABA problems), and b) the source location of the call to bli_thrinfo_barrier or bli_thrinfo_bcast as an address to a string literal. If any thread in a team records different information from its peers, a diagnostic is printed and the program aborts.
  • This information requires an additional dynamically-allocated array, and some extra reads/writes during the barrier process. While I haven't measured it, the performance impact should be small though (and is opt-in).
  • This should detect errors related to problems such as conditionally-taken barriers within a thread team, use of the incorrect thread info object, threads escaping barriers early, etc.

Limitations:

  • Both calls to bli_thrcomm_barrier within bli_thrinfo_bcast receive the same source line information. However, the check on sense variable should still catch any problems.
  • Certain problems (such as missing a broadcast) may still manifest as illegal memory accesses or memory corruption before the problem can be detected in a later barrier.
  • Not implemented for tree barriers yet. I would prefer to refactor the tree and non-tree barriers as a unified implementation first.

devinamatthews avatar Jun 22 '25 21:06 devinamatthews