SAMRAI icon indicating copy to clipboard operation
SAMRAI copied to clipboard

RAJA CUDA illegal memory access for the ConvDiff test

Open ctian282 opened this issue 4 years ago • 0 comments

After I successfully compile with RAJA and CUDA, the SAMRAI does not pass many tests, including the ConvDiff test, which shows error message

  1/953 Test   #1: blt_gtest_smoke .......................................................   Passed    0.00 sec
        Start   2: blt_fruit_smoke
  2/953 Test   #2: blt_fruit_smoke .......................................................   Passed    0.00 sec
        Start   3: blt_openmp_smoke
  3/953 Test   #3: blt_openmp_smoke ......................................................   Passed    0.00 sec
        Start   4: blt_mpi_smoke
  4/953 Test   #4: blt_mpi_smoke .........................................................   Passed    0.36 sec
        Start   5: blt_cuda_smoke
  5/953 Test   #5: blt_cuda_smoke ........................................................   Passed    0.21 sec
        Start   6: blt_cuda_runtime_smoke
  6/953 Test   #6: blt_cuda_runtime_smoke ................................................   Passed    0.04 sec
        Start   7: blt_cuda_openmp_smoke
  7/953 Test   #7: blt_cuda_openmp_smoke .................................................   Passed    0.24 sec
        Start   8: blt_cuda_mpi_smoke
  8/953 Test   #8: blt_cuda_mpi_smoke ....................................................   Passed    0.99 sec
        Start   9: convdiff_test_test.2d.input
9/953 Test   #9: convdiff_test_test.2d.input ...........................................***Failed    2.74 sec
CUDAassert: an illegal memory access was encountered /usr/include/RAJA/policy/cuda/synchronize.hpp 42
terminate called after throwing an instance of 'std::runtime_error'
  what():  CUDAassert
[compute1-exec-204:23351] *** Process received signal ***
[compute1-exec-204:23351] Signal: Aborted (6)
[compute1-exec-204:23351] Signal code:  (-6)
[compute1-exec-204:23351] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12980)[0x7f76ad17a980]
[compute1-exec-204:23351] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f76abe68fb7]
[compute1-exec-204:23351] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7f76abe6a921]
[compute1-exec-204:23351] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8c957)[0x7f76aca8c957]
[compute1-exec-204:23351] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92ae6)[0x7f76aca92ae6]
[compute1-exec-204:23351] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92b21)[0x7f76aca92b21]
[compute1-exec-204:23351] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92d54)[0x7f76aca92d54]
[compute1-exec-204:23351] [ 7] /scratch1/fs1/jmertens/SAMRAI/build/bin/convdiff(_Z19RAJA_ABORT_OR_THROWPKc+0x64)[0x55d43f912799]
[compute1-exec-204:23351] [ 8] /scratch1/fs1/jmertens/SAMRAI/build/bin/convdiff(_ZN4RAJA10cudaAssertE9cudaErrorPKcib+0x67)[0x55d43f91284e]
[compute1-exec-204:23351] [ 9] /scratch1/fs1/jmertens/SAMRAI/build/bin/convdiff(_ZN4RAJA11synchronizeINS_6policy4cuda16cuda_synchronizeEEEvv+0x34)[0x55d43ff913c9]
[compute1-exec-204:23351] [10] /scratch1/fs1/jmertens/SAMRAI/build/bin/convdiff(_ZN6SAMRAI4tbox11synchronizeINS0_6policy8parallelEEEvv+0x9)[0x55d43ff91307]
[compute1-exec-204:23351] [11] /scratch1/fs1/jmertens/SAMRAI/build/bin/convdiff(_ZN6SAMRAI4tbox20parallel_synchronizeEv+0x34)[0x55d43ff9133e]
[compute1-exec-204:23351] [12] /scratch1/fs1/jmertens/SAMRAI/build/bin/convdiff(_ZNK6SAMRAI4mesh17GriddingAlgorithm8fillTagsEiRKSt10shared_ptrINS_4hier10PatchLevelEEi+0x192)[0x55d43ffa2910]
[compute1-exec-204:23351] [13] /scratch1/fs1/jmertens/SAMRAI/build/bin/convdiff(_ZN6SAMRAI4mesh17GriddingAlgorithm17makeCoarsestLevelEd+0x131a)[0x55d43ff9757c]
[compute1-exec-204:23351] [14] /scratch1/fs1/jmertens/SAMRAI/build/bin/convdiff(main+0x1ca0)[0x55d43f95ec8b]
[compute1-exec-204:23351] [15] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f76abe4bbf7]
[compute1-exec-204:23351] [16] /scratch1/fs1/jmertens/SAMRAI/build/bin/convdiff(_start+0x2a)[0x55d43f903dea]
[compute1-exec-204:23351] *** End of error message ***

It seems this is caused by the code block

#if defined(HAVE_RAJA)
tbox::parallel_synchronize();
#endif

in GriddingAlgorithm::fillTags() when initializing with makeCoarsestLevel(). The RAJA version is v0.13.0 and the version recommended v0.12.1 has the same issue. I also passed all RAJA test under my environment.

ctian282 avatar Aug 25 '21 04:08 ctian282