hipamd
hipamd copied to clipboard
amd_detail: fix `atomicMax` and `atomicMin` for floating point types
The return value of atomicMax and atomicMin should be the old value from the destination memory.
This was not the case before the fix.
In cases where no concurrent thread is manipulating the destination the returned values were always the value passed to the atomic function.
test case (for float/double):
#include <assert.h>
#include <stdio.h>
#include "hip/hip_runtime.h"
template<typename T>
__global__ void testAtomicMin(bool* success, T operandOrig)
{
__shared__ T operand;
{
T const value = operandOrig / static_cast<T>(2);
T const reference = (operandOrig < value) ? operandOrig : value;
operand = operandOrig;
T const ret = atomicMin(&operand, value);
printf("1: %lf==%lf %lf==%lf\n", double(operandOrig), double(ret), double(value),double(reference));
assert(operandOrig == ret);
assert(value == reference);
}
{
T const value = static_cast<T>(2)*operandOrig;
T const reference = (operandOrig < value) ? operandOrig : value;
operand = operandOrig;
T const ret = atomicMin(&operand, value);
printf("2: %lf==%lf %lf==%lf\n", double(operandOrig), double(ret), double(operand),double(reference));
assert(operandOrig == ret);
assert(operand == reference);
}
}
#define HIP_ASSERT(x) (assert((x)==hipSuccess))
int main() {
bool* deviceResult;
HIP_ASSERT(hipMalloc((void**)&deviceResult,sizeof(bool)));
testAtomicMin<<<1,1>>>(deviceResult,42.f);
testAtomicMin<<<1,1>>>(deviceResult,42);
HIP_ASSERT(hipDeviceSynchronize());
HIP_ASSERT(hipFree(deviceResult));
return 0;
}
output:
1: 42.000000==21.000000 21.000000==21.000000
This issue has been already addressed in an internal version of hipamd. It will eventually make it on this repository as well.
@afanfa I would say that atomicMin(unsigned long long, unsigned long long) is broken too
https://github.com/ROCm-Developer-Tools/hipamd/blob/3ec1ccdbbbee7090ba854eddd1dee281973a4498/include/hip/amd_detail/amd_hip_atomic.h#L938-L951
If the data in memory tmp==32 and value==4) the return value is 4 but should be 32.
~~I found out that I checked the wrong branch, the main branch looks like and left over from older development, I opened an issue to remove this dead branch https://github.com/ROCm-Developer-Tools/hipamd/issues/41~~ [update: I updated the link to point to the develop branch, the bug is existing there too]
unsigned long long atomicMin and atomicMax is only wrong for architectures without __hip_atomic_compare_exchange_strong