oneMKL icon indicating copy to clipboard operation
oneMKL copied to clipboard

[cuBLAS] Gemm tests using half can fail

Open Rbiessy opened this issue 1 year ago • 0 comments

Summary

cuBLAS tests running Gemm with half precision can fail with wrong results on A100.

Version

Using the tip of develop as of today (https://github.com/oneapi-src/oneMKL/commit/6923d402d5bccba9ae1966062bc5a277fc74776c).

Environment

Using A100 with the DPC++ release 2024.2.0 and the associated Codeplay Nvidia plugin. The CUDA version is 12.6.2, OS is Ubuntu 22.04.

Steps to reproduce

cmake -Bbuild-a100 -GNinja -DCMAKE_CXX_COMPILER=`which icpx` -DENABLE_MKLCPU_BACKEND=OFF -DENABLE_MKLGPU_BACKEND=OFF -DENABLE_CUBLAS_BACKEND=ON -DENABLE_CURAND_BACKEND=ON -DENABLE_CUSOLVER_BACKEND=ON -DENABLE_CUFFT_BACKEND=ON -DREF_BLAS_ROOT=/path/to/lapack/install -DREF_LAPACK_ROOT=/path/to/lapack/install .
cd build-a100
ninja
ctest -R ".*GemmUsmTests.*Half.*" --output-on-failure

Observed behavior

Full log: log_a100.txt Short extract:

[ RUN      ] GemmUsmTestSuite/GemmUsmTests.HalfHalfFloatPrecision/Column_Major_NVIDIA_A100_PCIE_40GB
relative error = 0.496206 absolute error = 0.382722 limit = 0.00010848
Difference in entry (0,0): DPC++ 0.388574 vs. Reference 0.771296
relative error = 1.36303 absolute error = 1.67412 limit = 0.00010848
Difference in entry (1,0): DPC++ 0.445891 vs. Reference -1.22823
relative error = 1.05343 absolute error = 0.664006 limit = 0.00010848
Difference in entry (2,0): DPC++ 0.0336805 vs. Reference -0.630325
relative error = 1.0674 absolute error = 0.514821 limit = 0.00010848
Difference in entry (3,0): DPC++ 0.0325077 vs. Reference -0.482313
relative error = 0.789876 absolute error = 0.992507 limit = 0.00010848
Difference in entry (4,0): DPC++ -0.264029 vs. Reference -1.25654
relative error = 0.925093 absolute error = 1.07784 limit = 0.00010848

The differences between the output and reference seem too large to be due to a precision issue.

Expected behavior

The tests should pass.

Rbiessy avatar Oct 21 '24 14:10 Rbiessy