oneDPL icon indicating copy to clipboard operation
oneDPL copied to clipboard

Optimize atomic operations in scan kernel template

Open adamfidel opened this issue 1 year ago • 1 comments

It may be possible to reduce the number of atomics in the scan kernel template algorithm. Currently, we use atomic loads/stores for both the status flags and status values, but it might be possible to only use atomics for the flags and rely on the standard happens-before relations to guarantee that the values would be updated.

Discussed in https://github.com/oneapi-src/oneDPL/pull/1320#pullrequestreview-2019240324.

adamfidel avatar Apr 24 '24 15:04 adamfidel

I agree that the values here shouldn't need to be atomics, and the atomics for the status flags can provide everything we need.

danhoeflinger avatar May 22 '24 17:05 danhoeflinger