YAKL icon indicating copy to clipboard operation
YAKL copied to clipboard

SYCL Streams unit test fails on current main branch

Open mrnorman opened this issue 1 year ago • 5 comments

On current main branch, hash: d29e739f446cb9bcf3a12899cbabe754f471f58b

qsub -I -t 30 -n 1 -q florentia_debug
source jlse_gpu_O3.sh
make -j
make test
[ac.normanmr@florentia02:~/YAKL/unit/build/machines/jlse] >:O ./Streams/Streams 
Running on Intel(R) Graphics [0x0bd5]
1
YAKL FATAL ERROR:
ERROR: val1 is wrong
terminate called after throwing an instance of 'char const*'
Aborted

mrnorman avatar Mar 06 '23 20:03 mrnorman

Also, if -DYAKL_ENABLE_STREAMS is removed from the flags, we get a segmentation fault, and that needs to be fixed as well.

mrnorman avatar Mar 06 '23 20:03 mrnorman

I did see this error using the default runtime and modules you were loading. Fortunately, the experimental runtime and SDK that I've used to test the multi-stream fixed this issue. Will put down the details here for tracking and can close it when the official SDK fixes it.

abagusetty avatar Mar 06 '23 21:03 abagusetty

Thanks!

mrnorman avatar Mar 06 '23 21:03 mrnorman

Sorry about the delay. The test works fine both with the default SDK and also the experimental SDK as shown below from the logs. I am looking into the reason why the stream test fails (i.e., segfaults when not using -DYAKL_ENABLE_STREAMS). Hope this helps.

With the latest compiler + drivers on Sunspot (the multi-stream test passes as expected)

sunspot_build_latest_module
#!/bin/bash

module purge
module use /soft/testing/modulefiles/
module load intel-UMD23.05.25593.11/23.05.25593.11
module load dpcpp-master
module load spack cmake
module list

../../cmakeclean.sh

unset GATOR_DISABLE

export CC=`which clang`
export CXX=`which clang++`
export FC=`which gfortran`
unset CXXFLAGS
unset FFLAGS

cmake -DYAKL_ARCH="SYCL" \
-DYAKL_SYCL_FLAGS="-O3 -DYAKL_ENABLE_STREAMS" \
-DCMAKE_CXX_FLAGS="-O3 -fsycl -sycl-std=2020 -fsycl-unnamed-lambda -fsycl-device-code-split=per_kernel -fsycl-targets=spir64_gen -Xsycl-target-backend \"-device 12.60.7\"" \
-DYAKL_F90_FLAGS="-O3" \
-DYAKL_C_FLAGS="-O3"   \
../../..

make -j
ctest --no-tests=error

Test log for the above build
Test project /lus/gila/projects/CSC249ADSE15_CNDA/abagusetty/yakl_stream/unit/build/machines/jlse
      Start  1: CArray_test
 1/17 Test  #1: CArray_test ......................   Passed    0.10 sec
      Start  2: FArray_test
 2/17 Test  #2: FArray_test ......................   Passed    0.08 sec
      Start  3: Gator_test
 3/17 Test  #3: Gator_test .......................   Passed    0.10 sec
      Start  4: Random_test
 4/17 Test  #4: Random_test ......................   Passed    0.07 sec
      Start  5: FFT_test
 5/17 Test  #5: FFT_test .........................   Passed    2.24 sec
      Start  6: Reductions_test
 6/17 Test  #6: Reductions_test ..................   Passed    0.10 sec
      Start  7: Atomics_test
 7/17 Test  #7: Atomics_test .....................   Passed    0.07 sec
      Start  8: Pentadiagonal_test
 8/17 Test  #8: Pentadiagonal_test ...............   Passed    0.01 sec
      Start  9: Tridiagonal_test
 9/17 Test  #9: Tridiagonal_test .................   Passed    0.01 sec
      Start 10: Lambda_test
10/17 Test #10: Lambda_test ......................   Passed    0.06 sec
      Start 11: Fortran_Link_test
11/17 Test #11: Fortran_Link_test ................Subprocess aborted***Exception:   0.29 sec
      Start 12: Fortran_Gator_test
12/17 Test #12: Fortran_Gator_test ...............   Passed    0.11 sec
      Start 13: OpenMP_Regions_test
13/17 Test #13: OpenMP_Regions_test ..............   Passed    0.06 sec
      Start 14: Intrinsics_test
14/17 Test #14: Intrinsics_test ..................   Passed    0.09 sec
      Start 15: ParForC_test
15/17 Test #15: ParForC_test .....................   Passed    0.06 sec
      Start 16: ParForFortran_test
16/17 Test #16: ParForFortran_test ...............   Passed    0.06 sec
      Start 17: Streams_test
17/17 Test #17: Streams_test .....................   Passed    2.94 sec

94% tests passed, 1 tests failed out of 17

Total Test time (real) =   6.50 sec

The following tests FAILED:
	 11 - Fortran_Link_test (Subprocess aborted)
Errors while running CTest
Output from these tests are in: /lus/gila/projects/CSC249ADSE15_CNDA/abagusetty/yakl_stream/unit/build/machines/jlse/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.

Using the defaults: (jlse_gpu_O3_AoT_PVC.sh) The multi-stream fails with the default SDK which is as expected.

Test log with the default SDK
abagusetty@x1921c0s2b0n0 /lus/gila/projects/CSC249ADSE15_CNDA/abagusetty/yakl_stream/unit/build/machines/jlse (sycl_stream_fortranlink) $ ./Streams/Streams 
Running on Intel(R) Graphics [0x0bd6]
3
5
Pool Memory High Water Mark:       1610612736
Pool Memory High Water Efficiency: 0.75

All the above tests

abagusetty avatar Mar 27 '23 20:03 abagusetty

Current main still fails for me on JLSE florentia-debug node using jlse_gpu_O3.sh

mrnorman avatar Apr 17 '23 23:04 mrnorman

Omitted by streams removal.

mrnorman avatar Oct 28 '24 12:10 mrnorman