oneMKL icon indicating copy to clipboard operation
oneMKL copied to clipboard

[BLAS][MKLGPU] Trsv tests can fail on PVC

Open Rbiessy opened this issue 1 year ago • 0 comments

Summary

The MKLGPU backend tests can fail when running Trsv on PVC.

Version

Using the tip of develop as of today (https://github.com/oneapi-src/oneMKL/commit/6923d402d5bccba9ae1966062bc5a277fc74776c).

Environment

Running on PVC ( GPU Max 1100 1.3) with the oneAPI base toolkit 2024.2.0. OS is Ubuntu 22.04. apt level-zero package versions:

  • level-zero: 1.16.15-881~22.04
  • level-zero-dev: 1.16.15-881~22.04
  • intel-level-zero-gpu: 1.3.30049.10-950~22.04

Steps to reproduce

cmake -Bbuild-pvc -GNinja -DREF_BLAS_ROOT=/path/to/lapack/install -DREF_LAPACK_ROOT=/path/to/lapack/install .
cd build-pvc
ninja
ctest -R ".*Trsv.*" --output-on-failure

Observed behavior

Full log: log_pvc.txt The tests are failing with:

FATAL: Unexpected page fault from GPU at 0x7fa3dc0df000, ctx_id: 1 (CCS) type: 0 (NotPresent), level: 0 (PTE), access: 0 (Read), banned: 1, aborting.
FATAL: Unexpected page fault from GPU at 0x7fa3dc0df000, ctx_id: 1 (CCS) type: 0 (NotPresent), level: 0 (PTE), access: 0 (Read), banned: 1, aborting.
Abort was called at 287 line in file:
./shared/source/os_interface/linux/drm_neo.cpp

Note the DFT failures are reported in a separate issue: https://github.com/oneapi-src/oneMKL/issues/601

Expected behavior

The tests should pass.

Rbiessy avatar Oct 21 '24 15:10 Rbiessy