c11_atomics: Fix cl_uint --> cl_half conversion on host
The atomic tests were incorrectly casting cl_uint values directly to cl_half types using simple C-style casts, which doesn't properly handle half-precision floating-point conversion.
This caused incorrect bit patterns when using atomic operations with half types, as the cast would truncate the integer rather than convert it to the proper IEEE 754 half-precision format.
Fixed by:
- Adding ConvertToHostDataType() helper function that uses cl_half_from_float() for TYPE_ATOMIC_HALF conversion
- Updating all atomic store/exchange operations in CBasicTestStore, CBasicTestLoad, and CBasicTestExchange to use the proper conversion
This ensures atomic operations with cl_half types use correct half-precision bit patterns rather than truncated integer values.
Hi, I'm having a hard time connecting the dots together. Can you please provide a test command line that I can run that demonstrates the problem? Thanks!
Hi, I'm having a hard time connecting the dots together. Can you please provide a test command line that I can run that demonstrates the problem? Thanks!
Sorry for the late response - I was distracted by other work.
The issue can be reproduced by running:
./OpenCL-CTS/build/test_conformance/c11_atomics/test_c11_atomics svm_atomic_store
with Intel OpenCL CPU implementation (which has fp16 support enabled).
Root cause
In the OpenCL kernel:
#pragma OPENCL EXTENSION cl_khr_fp16 : enable
__kernel void test_atomic_kernel(uint threadCount, uint numDestItems, volatile __global atomic_half *destMemory, __global half *oldValues)
{
uint tid = get_global_id(0);
atomic_store_explicit(&destMemory[tid], tid, memory_order_relaxed, memory_scope_all_devices);
}
When uint tid is stored as a half value, the OpenCL device properly converts the integer to IEEE 754 half-precision floating point format. However, in the host reference code, the original implementation simply cast cl_uint directly to cl_half:
host_atomic_store(&destMemory[tid], (HostDataType)tid, MemoryOrder());
cl_half is defined as uint16_t in OpenCL Header, a direct cast from cl_uint to cl_half just truncates the integer bits, rather than performing proper float-to-half conversion.
Ah, gotcha, thank you for the explanation!
FWIW, it was the static cast to a float in the snippet below that looked questionable (note: v is a uint)::
// For half types, convert from float to proper half-precision bit
// pattern
return cl_half_from_float(static_cast<float>(v), gHalfRoundingMode);
But, it looks like this is matching what the kernel is doing (convert the uint to a half), so it's correct.
Interestingly, there does not appear to be a problem with our GPU device, which does not support SVM atomics but does support fp16 atomic load and store. I am running:
./test_conformance/c11_atomics/test_c11_atomics atomic_store
I'll add "focused review" and we'll see if we can get this merged next week.
Discussed in the November 4th teleconference. Will merge after @shajder 's review.
Interestingly, there does not appear to be a problem with our GPU device, which does not support SVM atomics but does support fp16 atomic load and store. I am running:
./test_conformance/c11_atomics/test_c11_atomics atomic_store
@bashbaug Thanks for pointing this out. atomic_store passes for both CPU and GPU, because by default it does not use HostFunction for verification:
https://github.com/KhronosGroup/OpenCL-CTS/blob/e641de99a5c6f671c63e490acd5da0e492a4438f/test_conformance/c11_atomics/common.h#L327-L333
https://github.com/KhronosGroup/OpenCL-CTS/blob/e641de99a5c6f671c63e490acd5da0e492a4438f/test_conformance/c11_atomics/common.h#L1141
https://github.com/KhronosGroup/OpenCL-CTS/blob/e641de99a5c6f671c63e490acd5da0e492a4438f/test_conformance/c11_atomics/common.h#L1426-L1428
https://github.com/KhronosGroup/OpenCL-CTS/blob/e641de99a5c6f671c63e490acd5da0e492a4438f/test_conformance/c11_atomics/common.h#L146-L155
https://github.com/KhronosGroup/OpenCL-CTS/blob/e641de99a5c6f671c63e490acd5da0e492a4438f/test_conformance/c11_atomics/test_atomics.cpp#L87-L92
The same issue can be exposed on CPU if we force using host threads for verification:
./test_conformance/c11_atomics/test_c11_atomics atomic_store -host
I've refactored the host half handling with a new wrapper class HostHalf so that we don't need much extra caution for the half conversions everywhere.