oneMKL icon indicating copy to clipboard operation
oneMKL copied to clipboard

[DFT][MKLGPU] Tests can fail on PVC

Open Rbiessy opened this issue 1 year ago • 0 comments

Summary

The MKLGPU backend tests can fail on PVC.

Version

Using the tip of develop as of today (https://github.com/oneapi-src/oneMKL/commit/6923d402d5bccba9ae1966062bc5a277fc74776c).

Environment

Running on PVC ( GPU Max 1100 1.3) with the oneAPI base toolkit 2024.2.0. OS is Ubuntu 22.04. apt level-zero package versions:

  • level-zero: 1.16.15-881~22.04
  • level-zero-dev: 1.16.15-881~22.04
  • intel-level-zero-gpu: 1.3.30049.10-950~22.04

Steps to reproduce

cmake -Bbuild-pvc -GNinja .
cd build-pvc
ninja
ctest --output-on-failure

Observed behavior

Full log: log_pvc.txt The tests failing all seem to be 2D. Short extract:

[ RUN      ] ComputeTestSuite/ComputeTests_in_place_COMPLEX.COMPLEX_SINGLE_in_place_buffer/sizes_4x4_fwd_strides_0_7_1_bwd_strides_0_5_1_batches_2_Intel_R__Data_Center_GPU_Max_1100
Mismatching results: actual = (2.32784,-0.862237) vs. reference = (-0.0695089,0.350374)
 relative error = 7.52116 absolute error = 2.68658 relative bound = 9.53674e-05 absolute bound = 1.55578e-05
 at position 2, 0, 0
 at indices 10, 8
Mismatching results: actual = (1.28088,-0.619282) vs. reference = (2.32784,-0.862237)
 relative error = 0.432961 absolute error = 1.07478 relative bound = 9.53674e-05 absolute bound = 1.55578e-05
 at position 2, 1, 0
 at indices 11, 9
Mismatching results: actual = (0.626577,1.75821) vs. reference = (1.28088,-0.619282)
 relative error = 1.7332 absolute error = 2.46588 relative bound = 9.53674e-05 absolute bound = 1.55578e-05
 at position 2, 2, 0
 at indices 12, 10

Note the BLAS failures are reported in a separate issue: https://github.com/oneapi-src/oneMKL/issues/600

Expected behavior

The tests should pass.

Rbiessy avatar Oct 21 '24 15:10 Rbiessy