Does cl_khr_integer_dot_product require OpenCL C 3.0?
I'm not sure if this is a Clang issue or an OpenCL spec issue, but I figure I'll start here.
It looks like Clang compiles OpenCL kernels that use the cl_khr_integer_dot_product extension when passed -cl-std=CL3.0, but it does not when compiling for previous OpenCL C versions.
- Here is an example that fails when compiling for OpenCL C 1.2: https://godbolt.org/z/9xPxWhPYq
- Here is the same example that succeeds when compiling for OpenCL C 3.0: https://godbolt.org/z/MWMMrWhnq
Logically speaking, this is the behavior described in the OpenCL C spec:
- The Features section says that "Feature test macros require support for OpenCL C 3.0 or newer."
- The new built-in functions added by
cl_khr_integer_dot_productfunctions (e.g. dot) say "Requires that the __opencl_c_integer_dot_product_input_4x8bit feature macro is defined."
This behavior is surprising to some users, though, especially because the metadata page for cl_khr_integer_dot_product doesn't say anything about an OpenCL C 3.0 requirement. In fact, instead it says (emphasis mine): "OpenCL C compilers supporting this extension will define the extension macro cl_khr_integer_dot_product, and may define corresponding feature macros __opencl_c_integer_dot_product_input_4x8bit and __opencl_c_integer_dot_product_input_4x8bit_packed depending on the reported capabilities."
What is the intended behavior? Does cl_khr_integer_dot_product require OpenCL C 3.0?
fyi rusticl simply passes the defines to the compiler explicitly and that works out fine: -cl-std=cl1.2 -target spir64 -emit-llvm -Xclang -finclude-default-header -g0 -O0 -Dcl_khr_integer_dot_product=1 -D__opencl_c_integer_dot_product_input_4x8bit_packed=1 -D__opencl_c_integer_dot_product_input_4x8bit=1
Clang enables a few extensions by default if you target SPIRV but it's not guaranteed for all of them, so in rusticl we are very explicit about what and what not to enable.
In regards to optional feature macros: I don't see why an OpenCL runtime wouldn't be allowed to set them when targeting OpenCL C 1.2, though maybe runtimes shouldn't and we can discuss this, but then an extension could define those for any OpenCL C version anyway.
Hmm, cl_khr_subgroup_rotate seems to have the same issue.
- Fails with OpenCL C 1.2: https://godbolt.org/z/d61vTsnsG
- Passes with OpenCL C 3.0: https://godbolt.org/z/8E19dna8Y
I don't see any feature macros associated with cl_khr_subgroup_rotate, so maybe that's unrelated to the issue?
Hmm,
cl_khr_subgroup_rotateseems to have the same issue.* Fails with OpenCL C 1.2: https://godbolt.org/z/d61vTsnsG * Passes with OpenCL C 3.0: https://godbolt.org/z/8E19dna8YI don't see any feature macros associated with
cl_khr_subgroup_rotate, so maybe that's unrelated to the issue?
the issue is the opencl-c-base.h header file: https://github.com/llvm/llvm-project/blob/main/clang/lib/Headers/opencl-c-base.h#L14
The only sane way for users of libclang using a SPIR(V) target to deal with enabling OpenCL C extenions is to do it all by hand as we do in mesa: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/compiler/clc/clc_helpers.cpp?ref_type=heads#L935
Discussed in the March 20th OpenCL tooling subgroup. In summary:
- The most reliable (and hence recommended) method to have the right extension and feature macros defined is to do it all "by hand", similar to the Mesa solution.
- The header file in Clang is trying to do the right thing on a best effort basis, but it should not be treated as a reference solution, and in its current state it appears to be broken and doing more harm than good.
- We will find an owner to clean up the header file in Clang, specifically by removing the extension parts that are wrapped by
#if defined(__SPIR__) || defined(__SPIRV__). This will then require SPIR-V generators to define the right extension and feature macros by hand, so we will need to do this with some care. - Separately, longer-term, we will investigate defining extension defines (and perhaps feature macros?) based on the Clang
-cl-ext=option, but we're not sure this will work in all cases.
To help drive discussion, I wrote a short test to check how different implementations behave for extension defines for different OpenCL C versions: https://github.com/KhronosGroup/OpenCL-CTS/pull/2376
There may be bugs in the test, but none of the OpenCL implementations I tested pass the test in its current form.