aomp
aomp copied to clipboard
Add tests for fast FP atomics
The use of hint is necessary because the compiler cannot determine all cases in which using fast FP atomics is safe. In particular, it is only safe to do so on coarse grain memory pages, which are enabled for mapped items as in this PR: https://github.com/ROCm-Developer-Tools/llvm-project/pull/149
Under USM mode, when a pointer is not mapped, it can still be used in a target region, but its pages could be managed as fine grain memory. Fast FP atomics on fine grain memory is a no-op, which is not what the user of atomic would want.
Running the first four tests on MI100 shows that there is no performance difference for single-precision floating-point add.
Success atomic sum of 5001 double's using CAS loop is: 12502500.000000 in 0.561011 secs Success atomic sum of 5001 float's using CAS loop is: 12502500.000000 in 0.006414 secs Success atomic sum of 5001 double's using fast FP atomics is: 12502500.000000 in 0.006381 secs Success atomic sum of 5001 float's using fast FP atomics is: 12502500.000000 in 0.006408 secs