kokkos-kernels
kokkos-kernels copied to clipboard
Nightly Sycl unit test failures with intel/2023.1.0, intel/2024.1.0 on Intel Ponte Vecchio
Testing with the Sycl backend on Intel Ponte Vecchio on the new Blake showed a couple failing sub-tests (failure output listed below the failing executable), depending on which environment variables set:
Default (ZES_ENABLE_SYSMAN unset)
The following tests FAILED:
13 - sparse_sycl (Failed)
[ FAILED ] sycl_test.sparse_coo2crs
[ FAILED ] sycl_test.sparse_spgemm_jacobi_double_int_size_t_TestExecSpace
[ FAILED ] sycl_test.sparse_spgemm_double_int_size_t_TestExecSpace
[ FAILED ] sycl_test.sparse_par_ilut_double_int_size_t_TestExecSpace
[ FAILED ] sycl_test.sparse_par_ilut_precond_double_int_size_t_TestExecSpace
14 - blocksparse_sycl (Failed)
[ FAILED ] sycl_test.sparse_bsr_gauss_seidel_rank1_double_int_size_t_TestExecSpace
[ FAILED ] sycl_test.sparse_bsr_gauss_seidel_rank2_double_int_size_t_TestExecSpace
[ FAILED ] sycl_test.sparse_block_spgemm_double_int_size_t_TestExecSpace
22 - wiki_spgemm (Subprocess aborted)
terminate called after throwing an instance of 'std::runtime_error'
what(): Error: No memory modules for the SYCL backend found. Make sure that ZES_ENABLE_SYSMAN=1 is set at run time!
ZES_ENABLE_SYSMAN=1
The following tests FAILED:
13/27 Test #13: sparse_sycl ......................Subprocess aborted***Exception: 45.76 sec
[==========] Running 48 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 48 tests from sycl_test
[ RUN ] sycl_test.sparse_coo2crs
/home/ndellin/kokkos-kernels/test_common/KokkosKernels_TestUtils.hpp:159: Failure
Expected: ((double)AT1::abs(val1 - val2)) <= ((double)AT3::abs(tol)), actual: 4.34205 vs 3.75255e-10
row: 17, crs_col_ids_ref(504) = 20 mismatched values!
Begin arguments for above failure...
RandCooMat<N6Kokkos7complexIdEE, N6Kokkos10LayoutLeftE, N6Kokkos12Experimental4SYCLE130...): rand seed: 3072659895
scalar: N6Kokkos7complexIdEE
layout: N6Kokkos10LayoutLeftE
m: 130, n: 130
...end arguments for above failure.
...
[ FAILED ] sycl_test.sparse_coo2crs (20842 ms)
[ RUN ] sycl_test.sparse_spgemm_jacobi_double_int_size_t_TestExecSpace
terminate called after throwing an instance of 'std::runtime_error'
what(): There was a synchronous SYCL error:
Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES) -5 (PI_ERROR_OUT_OF_RESOURCES)
14/27 Test #14: blocksparse_sycl .................***Failed 33.50 sec
...
[ RUN ] sycl_test.sparse_block_spgemm_double_int_size_t_TestExecSpace
nentries_actual:1564 nentries_reference:2423
/home/ndellin/kokkos-kernels/sparse/unit_test/Test_Sparse_bspgemm.hpp:235: Failure
Value of: is_identical
Actual: false
Expected: true
SPGEMM_KK
...
Reproducer (Blake PV queue): SHAs: kokkos/kokkos@7e299b4e25c42528e105379c3aa9a318056545ba kokkos/kokkos-kernels@acdd8969109b53f2b3b0915ef51aef9800a44587
- the cm_generate_makefile use requires changes from #1960 to pass in extra cmake flags, here disabling onedpl
module load cmake intel-oneapi-compilers/2023.1.0
$KOKKOSKERNELS_PATH/cm_generate_makefile.bash --with-sycl --arch=INTEL_PVC --compiler=/projects/x86-64-icelake-rocky8/compilers/intel-oneapi-compilers/2023.1.0/gcc/8.5.0/base/6g2jkiv/compiler/2023.1.0/linux/bin-llvm/clang++ --cxxflags="-fp-model=precise" --shared --kokkos-cmake-flags=-DKokkos_ENABLE_ONEDPL=OFF
Edit: Added shas used in the testing
Updating the issue with failures as of SHA 32aa75a8f20ca88df64bde421c335b9fa6f68397
Configuration 1 (no TPLs):
salloc -N 1 -p PV
source /projects/x86-64-icelake-rocky8/spack-config/blake-setup-user-module-env.sh
module purge
module load cmake intel-oneapi-compilers/2023.1.0 intel-oneapi-dpl/2022.1.0 git
# Required for the hashmap accumulator
export ZES_ENABLE_SYSMAN=1
# Configuration
$KOKKOSKERNELS_PATH/cm_generate_makefile.bash --with-sycl --arch=INTEL_PVC --compiler=/projects/x86-64-icelake-rocky8/compilers/intel-oneapi-compilers/2023.1.0/gcc/8.5.0/base/6g2jkiv/compiler/2023.1.0/linux/bin-llvm/clang++ --cxxflags="-fp-model=precise" --shared --kokkos-cmake-flags=-DKokkos_ENABLE_ONEDPL=OFF -kokkos-path=$KOKKOS_PATH
Test failures on PVC:
23:43:24 The following tests FAILED:
23:43:24 15 - sparse_sycl (SEGFAULT)
23:43:24 16 - blocksparse_sycl (Failed)
Configuration 2 (oneMKL):
salloc -N 1 -p PV
source /projects/x86-64-icelake-rocky8/spack-config/blake-setup-user-module-env.sh
module purge
module load git cmake intel-oneapi-compilers/2023.1.0 intel-oneapi-dpl/2022.1.0 intel-oneapi-mkl/2023.1.0 intel-oneapi-tbb/2021.9.0
# Required for the hashmap accumulator
export ZES_ENABLE_SYSMAN=1
# Configuration
$KOKKOSKERNELS_PATH/cm_generate_makefile.bash --with-sycl --arch=INTEL_PVC --compiler=icpx --cxxflags="-fp-model=precise" --shared --with-tpls=mkl --kokkos-cmake-flags=-DKokkos_ENABLE_ONEDPL=OFF -kokkos-path=$KOKKOS_PATH
Test failures on PVC:
05:49:17 The following tests FAILED:
05:49:17 9 - blas_sycl (Failed)
05:49:17 15 - sparse_sycl (Subprocess aborted)
05:49:17 16 - blocksparse_sycl (Failed)
05:49:17 26 - wiki_spadd (Subprocess aborted)
Joe installed intel oneapi 2024.1.0 on Blake, I tested the MKL configuration above:
Test failures:
15/32 Test #15: sparse_sycl ......................***Failed 194.78 sec
...
[ PASSED ] 47 tests.
[ FAILED ] 4 tests, listed below:
[ FAILED ] sycl_test.sparse_spgemm_jacobi_double_int_int_TestDevice
[ FAILED ] sycl_test.sparse_spgemm_double_int_int_TestDevice
[ FAILED ] sycl_test.sparse_spmv_double_int_int_TestDevice
[ FAILED ] sycl_test.sparse_par_ilut_double_int_int_TestDevice
16/32 Test #16: blocksparse_sycl .................***Failed 29.87 sec
...
[==========] 7 tests from 1 test case ran. (29406 ms total)
[ PASSED ] 6 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] sycl_test.sparse_block_spgemm_double_int_int_TestDevice
Configuration (Sycl backend, intel/2024.1.0 with mkl/2024.0.0):
source /projects/x86-64-icelake-rocky8/spack-config/blake-setup-user-module-env.sh
module purge
module load cmake intel-oneapi-compilers/2024.1.0 intel-oneapi-dpl/2022.5.0 intel-oneapi-tbb/2021.12.0 intel-oneapi-mkl/2024.0.0
module list
# Required for the hashmap accumulator
export ZES_ENABLE_SYSMAN=1
# Configuration
$KOKKOSKERNELS_PATH/cm_generate_makefile.bash --with-sycl --arch=INTEL_PVC --compiler=icpx --cxxflags="-fp-model=precise -Wno-pass-failed" --shared --with-tpls=mkl --kokkos-path=$KOKKOS_PATH
make -j16
# Unit tests
export ONEAPI_DEVICE_SELECTOR=ext_oneapi_level_zero:gpu
ctest --output-on-failure
I've been poking around with this:
In the SpGEMM, it seems that Kokkos::atomic_add(addr, val); always results in *addr = 0, (or possibly *addr unchanged, e.g. Kokkos::atomic_add is a no-op).
however, *addr += val; causes some math to happen (though produces the incorrect values in a context where atomics are needed
I've tried replacing Kokkos::atomic_add(addr, val) with various flavors of
auto v = sycl::atomic_ref<std::remove_reference_t<decltype(*addr)>,
sycl::memory_order::relaxed,
sycl::memory_scope::device,
sycl::access::address_space::global_space>(*addr);
v += val;
but no luck so far.
I've also tried running the Kokkos Core atomics unit tests built with the same Core that I use for the Kernels unit tests, and the Core atomic unit tests all pass.
Reimplementing alignPtr as
template <typename InPtr, typename T>
KOKKOS_INLINE_FUNCTION T *alignPtr(InPtr p) {
std::uintptr_t ptrVal = reinterpret_cast<std::uintptr_t>(p);
while (ptrVal % alignof(T)) {
++ptrVal;
}
return reinterpret_cast<T *>(ptrVal);
}
seems to make the SpGEMM unit tests pass. However, using the equivalent
template <typename InPtr, typename T>
KOKKOS_INLINE_FUNCTION T *alignPtr(InPtr p) {
std::uintptr_t ptrVal = reinterpret_cast<std::uintptr_t>(p);
return reinterpret_cast<T *>((ptrVal + alignof(T) - 1) / alignof(T) * alignof(T));
}
does not. May be a SYCL compiler issue (unless (ptrVal + alignof(T) - 1) overflows)
unsigned int f1(unsigned int i, unsigned int align) // today
{
return ((i + align - 1) & (~(align - 1)));
}
unsigned int f2(unsigned int i, unsigned int align)
{
return ((i + align - 1) / align * align);
}
unsigned int f3(unsigned int i, unsigned int align) // gcc
{
return (i + align - 1) & (-align);
}
unsigned int f4(unsigned int i, unsigned int align)
{
while (i % align) {
++i;
}
return i;
}
only f4 works for SYCL SpGEMM
in clang-trunk x86 in godbolt, f1 and f3 compile to the same instructions. f2 and f4 are each different again.
Status update as-of 7/9/2024 following merge of some recent fixes:
Sycl + PV, no MKL
Failing tests
23:41:05 The following tests FAILED:
23:41:05 15 - sparse_sycl (Failed)
Failure output snips: sparse_sycl
23:40:11 [ RUN ] sycl_test.sparse_coo2crs
23:40:11 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_OneAPI_2023_1_0_Sycl_PV/kokkos-kernels/test_common/KokkosKernels_TestUtils.hpp:159: Failure
23:40:11 Expected: ((double)AT1::abs(val1 - val2)) <= ((double)AT3::abs(tol)), actual: 1.37386 vs 3.71438e-09
23:40:11 row: 31, crs_col_ids_ref(2871) = 25 mismatched values!
23:40:11 Begin arguments for above failure...
23:40:11 RandCooMat<N6Kokkos7complexIdEE, N6Kokkos10LayoutLeftE, N6Kokkos12Experimental4SYCLE409...): rand seed: 990578628
23:40:11 scalar: N6Kokkos7complexIdEE
23:40:11 layout: N6Kokkos10LayoutLeftE
23:40:11 m: 409, n: 409
23:40:11 ...end arguments for above failure.
23:40:11
23:40:11 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_OneAPI_2023_1_0_Sycl_PV/kokkos-kernels/test_common/KokkosKernels_TestUtils.hpp:159: Failure
23:40:11 Expected: ((double)AT1::abs(val1 - val2)) <= ((double)AT3::abs(tol)), actual: 3.55771 vs 3.71438e-09
23:40:11 row: 37, crs_col_ids_ref(3420) = 213 mismatched values!
23:40:11 Begin arguments for above failure...
23:40:11 RandCooMat<N6Kokkos7complexIdEE, N6Kokkos10LayoutLeftE, N6Kokkos12Experimental4SYCLE409...): rand seed: 990578628
23:40:11 scalar: N6Kokkos7complexIdEE
23:40:11 layout: N6Kokkos10LayoutLeftE
23:40:11 m: 409, n: 409
23:40:11 ...end arguments for above failure.
Sycl + PV, with MKL
Failing tests:
05:51:47 The following tests FAILED:
05:51:47 9 - blas_sycl (Failed)
05:51:47 15 - sparse_sycl (Failed)
05:51:47 16 - blocksparse_sycl (Failed)
Failure output snips: blas_sycl
05:47:45 [ RUN ] sycl_test.gemv_double
05:47:45 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_OneAPI_2023_1_0_Sycl_PV-oneMKL/kokkos-kernels/blas/unit_test/Test_Blas2_gemv.hpp:141: expected(0)=-1.50128, h_y(0)=nan, eps=2.22045e-16, 1024*2*eps=4.54747e-13
...
05:47:45 beta = 0, input contains NaN, A is 2131x2131, mode T: gemv incorrect
05:47:45 [ FAILED ] sycl_test.gemv_double (643 ms)
05:47:45 [ RUN ] sycl_test.blas_gemv_streams_double_int_int_TestDevice
05:47:45 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_OneAPI_2023_1_0_Sycl_PV-oneMKL/kokkos-kernels/blas/unit_test/Test_Blas2_gemv.hpp:141: expected(0)=19.9203, h_y(0)=nan, eps=2.22045e-16, 1024*2*eps=4.54747e-13
...
05:47:45 Value of: 0
05:47:45 Expected: numErrors
05:47:45 Which is: 40
05:47:45 beta = 0, input contains NaN, A is 50x40, mode T: gemv incorrect
05:47:45 [ FAILED ] sycl_test.blas_gemv_streams_double_int_int_TestDevice (79 ms)
sparse_sycl:
05:50:55 [ RUN ] sycl_test.sparse_coo2crs
05:50:55 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_OneAPI_2023_1_0_Sycl_PV-oneMKL/kokkos-kernels/test_common/KokkosKernels_TestUtils.hpp:159: Failure
05:50:55 Expected: ((double)AT1::abs(val1 - val2)) <= ((double)AT3::abs(tol)), actual: 5.46403 vs 2.86173e-09
05:50:55 row: 19, crs_col_ids_ref(1595) = 288 mismatched values!
05:50:55 Begin arguments for above failure...
05:50:55 RandCooMat<N6Kokkos7complexIdEE, N6Kokkos10LayoutLeftE, N6Kokkos12Experimental4SYCLE359...): rand seed: 3195414009
05:50:55 scalar: N6Kokkos7complexIdEE
05:50:55 layout: N6Kokkos10LayoutLeftE
05:50:55 m: 359, n: 359
05:50:55 ...end arguments for above failure.
...
05:50:55 [ RUN ] sycl_test.sparse_spmv_double_int_int_TestDevice
05:50:55 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_OneAPI_2023_1_0_Sycl_PV-oneMKL/kokkos-kernels/sparse/unit_test/Test_Sparse_spmv.hpp:216: Failure
05:50:55 Value of: threw
05:50:55 Actual: true
05:50:55 Expected: false
05:50:55 KokkosSparse::Test::spmv 1D, mode T: threw exception:
05:50:55 oneapi::mkl::sparse::gemv: unimplemented functionality: currently only supports the oneapi::mkl::transpose::nontrans op
...
05:50:55 [ RUN ] sycl_test.sparse_spmv_mv_double_int_int_LayoutLeft_TestDevice
05:50:55 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_OneAPI_2023_1_0_Sycl_PV-oneMKL/kokkos-kernels/sparse/unit_test/Test_Sparse_spmv.hpp:268: Failure
05:50:55 Value of: threw
05:50:55 Actual: true
05:50:55 Expected: false
05:50:55 KokkosSparse::Test::spmv 2D, mode T: threw exception:
05:50:55 oneapi::mkl::sparse::gemv: unimplemented functionality: currently only supports the oneapi::mkl::transpose::nontrans op
...
05:50:55 [ RUN ] sycl_test.sparse_sptrsv_double_int_int_TestDevice
05:50:55 unknown file: Failure
05:50:55 C++ exception with description "oneapi::mkl::sparse::optimize_gemv: unimplemented functionality: currently only supports the oneapi::mkl::transpose::nontrans op" thrown in the test body.
...
05:50:55 [ PASSED ] 47 tests.
05:50:55 [ FAILED ] 4 tests, listed below:
05:50:55 [ FAILED ] sycl_test.sparse_coo2crs
05:50:55 [ FAILED ] sycl_test.sparse_spmv_double_int_int_TestDevice
05:50:55 [ FAILED ] sycl_test.sparse_spmv_mv_double_int_int_LayoutLeft_TestDevice
05:50:55 [ FAILED ] sycl_test.sparse_sptrsv_double_int_int_TestDevice
05:50:55
05:50:55 4 FAILED TESTS
blocksparse_sycl
05:51:18 [ RUN ] sycl_test.sparse_bsr_spmv_double_int_int_TestDevice
05:51:18 unknown file: Failure
05:51:18 C++ exception with description "oneapi::mkl::sparse::optimize_gemv: unimplemented functionality: currently only supports the oneapi::mkl::transpose::nontrans op" thrown in the test body.
05:51:18 [ FAILED ] sycl_test.sparse_bsr_spmv_double_int_int_TestDevice (1627 ms)
05:51:18 [ RUN ] sycl_test.sparse_bsr_spmmv_double_int_int_LayoutLeft_TestDevice
05:51:18 unknown file: Failure
05:51:18 C++ exception with description "oneapi::mkl::sparse::optimize_gemv: unimplemented functionality: currently only supports the oneapi::mkl::transpose::nontrans op" thrown in the test body.
05:51:18 [ FAILED ] sycl_test.sparse_bsr_spmmv_double_int_int_LayoutLeft_TestDevice (3108 ms)
05:51:18 [----------] 7 tests from sycl_test (22517 ms total)
05:51:18
05:51:18 [----------] Global test environment tear-down
05:51:18 [==========] 7 tests from 1 test case ran. (22517 ms total)
05:51:18 [ PASSED ] 5 tests.
05:51:18 [ FAILED ] 2 tests, listed below:
05:51:18 [ FAILED ] sycl_test.sparse_bsr_spmv_double_int_int_TestDevice
05:51:18 [ FAILED ] sycl_test.sparse_bsr_spmmv_double_int_int_LayoutLeft_TestDevice
Status update 7/12/2024:
After the recent gemv fallback updates, the Sycl builds are in better shape with only the sparse_coo2crs test failure remaining:
sparse_sycl
23:40:11 [ RUN ] sycl_test.sparse_coo2crs
23:40:11 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_OneAPI_2023_1_0_Sycl_PV/kokkos-kernels/test_common/KokkosKernels_TestUtils.hpp:159: Failure
23:40:11 Expected: ((double)AT1::abs(val1 - val2)) <= ((double)AT3::abs(tol)), actual: 1.37386 vs 3.71438e-09
23:40:11 row: 31, crs_col_ids_ref(2871) = 25 mismatched values!
23:40:11 Begin arguments for above failure...
23:40:11 RandCooMat<N6Kokkos7complexIdEE, N6Kokkos10LayoutLeftE, N6Kokkos12Experimental4SYCLE409...): rand seed: 990578628
23:40:11 scalar: N6Kokkos7complexIdEE
23:40:11 layout: N6Kokkos10LayoutLeftE
23:40:11 m: 409, n: 409
23:40:11 ...end arguments for above failure.
23:40:11
23:40:11 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_OneAPI_2023_1_0_Sycl_PV/kokkos-kernels/test_common/KokkosKernels_TestUtils.hpp:159: Failure
23:40:11 Expected: ((double)AT1::abs(val1 - val2)) <= ((double)AT3::abs(tol)), actual: 3.55771 vs 3.71438e-09
23:40:11 row: 37, crs_col_ids_ref(3420) = 213 mismatched values!
23:40:11 Begin arguments for above failure...
23:40:11 RandCooMat<N6Kokkos7complexIdEE, N6Kokkos10LayoutLeftE, N6Kokkos12Experimental4SYCLE409...): rand seed: 990578628
23:40:11 scalar: N6Kokkos7complexIdEE
23:40:11 layout: N6Kokkos10LayoutLeftE
23:40:11 m: 409, n: 409
23:40:11 ...end arguments for above failure.