libcudacxx icon indicating copy to clipboard operation
libcudacxx copied to clipboard

Soundness bugfix for barrier<thread_scope_block> on sm_70

Open gonzalobg opened this issue 1 year ago • 1 comments

For sm_70, barrier arrive has an optimization to "coalesce" all arrives with the same update to the same barrier into a single update performed by a "leader" thread.

This optimization is missing a release fence to establish cummulativity between all coalesced threads and the leader, before the leader performs the update.

gonzalobg avatar Aug 10 '22 13:08 gonzalobg

@daniellustig could maybe review?

gonzalobg avatar Aug 10 '22 13:08 gonzalobg

The cumulativity fix seems reasonable to me

daniellustig avatar Aug 12 '22 13:08 daniellustig

@wmaxey ?

gonzalobg avatar Aug 17 '22 11:08 gonzalobg

Thanks for review, David. Merging.

wmaxey avatar Sep 15 '22 01:09 wmaxey