aomp Add tests for fast FP atomics

Add tests for fast FP atomics

Open carlobertolli opened this issue 4 years ago • 2 comments

trafficstars

Sep 03 '21 15:09 carlobertolli

The use of hint is necessary because the compiler cannot determine all cases in which using fast FP atomics is safe. In particular, it is only safe to do so on coarse grain memory pages, which are enabled for mapped items as in this PR: https://github.com/ROCm-Developer-Tools/llvm-project/pull/149

Under USM mode, when a pointer is not mapped, it can still be used in a target region, but its pages could be managed as fine grain memory. Fast FP atomics on fine grain memory is a no-op, which is not what the user of atomic would want.

Sep 21 '21 15:09 carlobertolli

Running the first four tests on MI100 shows that there is no performance difference for single-precision floating-point add.

Success atomic sum of 5001 double's using CAS loop is: 12502500.000000 in 0.561011 secs Success atomic sum of 5001 float's using CAS loop is: 12502500.000000 in 0.006414 secs Success atomic sum of 5001 double's using fast FP atomics is: 12502500.000000 in 0.006381 secs Success atomic sum of 5001 float's using fast FP atomics is: 12502500.000000 in 0.006408 secs

Jan 07 '22 02:01 zjin-lcf

aomp aomp copied to clipboard

Add tests for fast FP atomics

aomp
aomp copied to clipboard