oneMKL
oneMKL copied to clipboard
[BLAS][MKLGPU] Trsv tests can fail on PVC
Summary
The MKLGPU backend tests can fail when running Trsv on PVC.
Version
Using the tip of develop as of today (https://github.com/oneapi-src/oneMKL/commit/6923d402d5bccba9ae1966062bc5a277fc74776c).
Environment
Running on PVC ( GPU Max 1100 1.3) with the oneAPI base toolkit 2024.2.0. OS is Ubuntu 22.04. apt level-zero package versions:
- level-zero: 1.16.15-881~22.04
- level-zero-dev: 1.16.15-881~22.04
- intel-level-zero-gpu: 1.3.30049.10-950~22.04
Steps to reproduce
cmake -Bbuild-pvc -GNinja -DREF_BLAS_ROOT=/path/to/lapack/install -DREF_LAPACK_ROOT=/path/to/lapack/install .
cd build-pvc
ninja
ctest -R ".*Trsv.*" --output-on-failure
Observed behavior
Full log: log_pvc.txt The tests are failing with:
FATAL: Unexpected page fault from GPU at 0x7fa3dc0df000, ctx_id: 1 (CCS) type: 0 (NotPresent), level: 0 (PTE), access: 0 (Read), banned: 1, aborting.
FATAL: Unexpected page fault from GPU at 0x7fa3dc0df000, ctx_id: 1 (CCS) type: 0 (NotPresent), level: 0 (PTE), access: 0 (Read), banned: 1, aborting.
Abort was called at 287 line in file:
./shared/source/os_interface/linux/drm_neo.cpp
Note the DFT failures are reported in a separate issue: https://github.com/oneapi-src/oneMKL/issues/601
Expected behavior
The tests should pass.