b-sumner
b-sumner
Section 8.3.3 of the C spec specifically allows sampling to flush denorms, so this change seems to be long overdue.
It just seems strange that the CTS build looks for, and tells me when it can't find the OpenCL headers, but this build doesn't.
I was against introducing OpFRem, and still believe that it should be removed rather than producing completely wrong answers in certain cases. Why can't those that want an approximation that...
Which OS are you running? For Ubuntu 20.04, you need the HWE stack. I'm not sure about other distros.
We have some work left in the device compiler to support certain cuda 9 device side features such as the sync APIs. Also note that most AMD devices have a...
The *_sync functions are not available in 6.1, see, e.g. https://github.com/ROCm/clr/tree/rocm-6.1.x/hipamd/include/hip/amd_detail . The develop branch has an implementation which may appear in a future release.
The develop implementation mentioned above has restrictions on its use that match the restrictions stated for pascal in the cuda guide.
It's hard to say much without an example. Can you provide access to a minimal self-contained example (i.e. a single file that can be simply compiled with hipcc as opposed...
The compiler will try to form packed operations from arbitrary code and will attempt to form fma when contractions are enabled. But ou can raise the likelihood of packed fma...
This is OpenCL, correct? fma(float2, float2, float2) is a standard OpenCL builtin.