OpenCL-CTS icon indicating copy to clipboard operation
OpenCL-CTS copied to clipboard

c11_atomics: Fix cl_uint --> cl_half conversion on host

Open Nuullll opened this issue 3 months ago • 3 comments

The atomic tests were incorrectly casting cl_uint values directly to cl_half types using simple C-style casts, which doesn't properly handle half-precision floating-point conversion.

This caused incorrect bit patterns when using atomic operations with half types, as the cast would truncate the integer rather than convert it to the proper IEEE 754 half-precision format.

Fixed by:

  • Adding ConvertToHostDataType() helper function that uses cl_half_from_float() for TYPE_ATOMIC_HALF conversion
  • Updating all atomic store/exchange operations in CBasicTestStore, CBasicTestLoad, and CBasicTestExchange to use the proper conversion

This ensures atomic operations with cl_half types use correct half-precision bit patterns rather than truncated integer values.

Nuullll avatar Aug 26 '25 08:08 Nuullll

Hi, I'm having a hard time connecting the dots together. Can you please provide a test command line that I can run that demonstrates the problem? Thanks!

bashbaug avatar Sep 16 '25 20:09 bashbaug

Hi, I'm having a hard time connecting the dots together. Can you please provide a test command line that I can run that demonstrates the problem? Thanks!

Sorry for the late response - I was distracted by other work.

The issue can be reproduced by running:

./OpenCL-CTS/build/test_conformance/c11_atomics/test_c11_atomics svm_atomic_store

with Intel OpenCL CPU implementation (which has fp16 support enabled).

Root cause

In the OpenCL kernel:

#pragma OPENCL EXTENSION cl_khr_fp16 : enable
__kernel void test_atomic_kernel(uint threadCount, uint numDestItems, volatile __global atomic_half *destMemory, __global half *oldValues)
{
  uint  tid = get_global_id(0);
  atomic_store_explicit(&destMemory[tid], tid, memory_order_relaxed, memory_scope_all_devices);
}

When uint tid is stored as a half value, the OpenCL device properly converts the integer to IEEE 754 half-precision floating point format. However, in the host reference code, the original implementation simply cast cl_uint directly to cl_half:

host_atomic_store(&destMemory[tid], (HostDataType)tid, MemoryOrder());

cl_half is defined as uint16_t in OpenCL Header, a direct cast from cl_uint to cl_half just truncates the integer bits, rather than performing proper float-to-half conversion.

Nuullll avatar Oct 30 '25 05:10 Nuullll

Ah, gotcha, thank you for the explanation!

FWIW, it was the static cast to a float in the snippet below that looked questionable (note: v is a uint)::

            // For half types, convert from float to proper half-precision bit
            // pattern
            return cl_half_from_float(static_cast<float>(v), gHalfRoundingMode);

But, it looks like this is matching what the kernel is doing (convert the uint to a half), so it's correct.

Interestingly, there does not appear to be a problem with our GPU device, which does not support SVM atomics but does support fp16 atomic load and store. I am running:

./test_conformance/c11_atomics/test_c11_atomics atomic_store

I'll add "focused review" and we'll see if we can get this merged next week.

bashbaug avatar Oct 30 '25 15:10 bashbaug

Discussed in the November 4th teleconference. Will merge after @shajder 's review.

bashbaug avatar Nov 04 '25 16:11 bashbaug

Interestingly, there does not appear to be a problem with our GPU device, which does not support SVM atomics but does support fp16 atomic load and store. I am running:

./test_conformance/c11_atomics/test_c11_atomics atomic_store

@bashbaug Thanks for pointing this out. atomic_store passes for both CPU and GPU, because by default it does not use HostFunction for verification: https://github.com/KhronosGroup/OpenCL-CTS/blob/e641de99a5c6f671c63e490acd5da0e492a4438f/test_conformance/c11_atomics/common.h#L327-L333 https://github.com/KhronosGroup/OpenCL-CTS/blob/e641de99a5c6f671c63e490acd5da0e492a4438f/test_conformance/c11_atomics/common.h#L1141 https://github.com/KhronosGroup/OpenCL-CTS/blob/e641de99a5c6f671c63e490acd5da0e492a4438f/test_conformance/c11_atomics/common.h#L1426-L1428 https://github.com/KhronosGroup/OpenCL-CTS/blob/e641de99a5c6f671c63e490acd5da0e492a4438f/test_conformance/c11_atomics/common.h#L146-L155 https://github.com/KhronosGroup/OpenCL-CTS/blob/e641de99a5c6f671c63e490acd5da0e492a4438f/test_conformance/c11_atomics/test_atomics.cpp#L87-L92

The same issue can be exposed on CPU if we force using host threads for verification:

./test_conformance/c11_atomics/test_c11_atomics atomic_store -host

Nuullll avatar Nov 06 '25 07:11 Nuullll

I've refactored the host half handling with a new wrapper class HostHalf so that we don't need much extra caution for the half conversions everywhere.

Nuullll avatar Nov 24 '25 08:11 Nuullll