compute-runtime
compute-runtime copied to clipboard
ocl: incorrect atomics behavior on Celeron/Atom with HD Graphics
Global memory updates using 32-bit atomic behave non-atomic on Intel HD Graphics integrated into Celeron/Atom platform. Specifically, Intel(R) Celeron(R) CPU J3455 @ 1.50GHz
(lscpu
) with Intel(R) Graphics [0x5a85]
(clinfo
). It is likely reproducible on similar Celeron/Atom based CPUs with integrated HD Graphics, and perhaps a misconfiguration/enabling of features in the driver stack (either i915 kmd or up the stack aka compute runtime).
How to reproduce:
cd ${HOME}
git clone https://github.com/hfp/libxsmm.git
cd libxsmm
git checkout 885830e65da003fc4c72113239080a7c069647b5
make -j
cd ${HOME}
git clone https://github.com/hfp/dbcsr.git
cd dbcsr
git checkout 0684ae7c14c43d842059f1cfb9b5646594fa9740
cd src/acc
echo "edit acc_bench_smm.c:22 and change 'double' to 'float'"
cd opencl
make
../acc_bench_smm
The console output of below command looks like:
../acc_bench_smm 3 30000 23 23 23 1875 18750 18750
typename (id=1): float
copy-in: 67.8 ms 2.4 GB/s
transpose: 49.1 ms 13.8 GFLOPS/s
device: 44.7 ms 15.2 GFLOPS/s
host: 34.3 ms 19.8 GFLOPS/s
max.error: abs=849.74 rel=1
In the above output (max.error: abs=849.74 rel=1
), the error appears due to data races or non-atomic updates. Generally, GEN9 based devices as integrated into Core based processors work just fine (atomic flow). Similar to Core, the Celeron/Atom based OpenCL platform advertises sufficient support for atomic ops like cl_khr_global_int32_base_atomics
and cl_khr_global_int32_extended_atomics
used by the reproducer.
The reproducer implements atomic FP32-updates using the usual flow based on cmpxchg
or xchg
. The atomic implementation can be toggled using OPENCL_LIBSMM_SMM_ATOMICS=cmpxchg
(default on GEN9), OPENCL_LIBSMM_SMM_ATOMICS=xchg
, or OPENCL_LIBSMM_SMM_ATOMICS=0
. The latter of which replaces the atomic flow with plain FP32-add ("+=") meant to observe/study performance differences. However on Celeron/Atom based GEN9, the accumulated error due to data races is similar between supposedly atomic flow and non-atomic flow.
We have reproduced the issue and placed the bug in our debug queue, but do not have an ETA for a fix.
Thank you very much!
We have reproduced the issue and placed the bug in our debug queue, but do not have an ETA for a fix.
@AdamCetnerowski Over a year has passed. Any updates?