SAMRAI
SAMRAI copied to clipboard
RAJA CUDA illegal memory access for the ConvDiff test
After I successfully compile with RAJA and CUDA, the SAMRAI does not pass many tests, including the ConvDiff test, which shows error message
1/953 Test #1: blt_gtest_smoke ....................................................... Passed 0.00 sec
Start 2: blt_fruit_smoke
2/953 Test #2: blt_fruit_smoke ....................................................... Passed 0.00 sec
Start 3: blt_openmp_smoke
3/953 Test #3: blt_openmp_smoke ...................................................... Passed 0.00 sec
Start 4: blt_mpi_smoke
4/953 Test #4: blt_mpi_smoke ......................................................... Passed 0.36 sec
Start 5: blt_cuda_smoke
5/953 Test #5: blt_cuda_smoke ........................................................ Passed 0.21 sec
Start 6: blt_cuda_runtime_smoke
6/953 Test #6: blt_cuda_runtime_smoke ................................................ Passed 0.04 sec
Start 7: blt_cuda_openmp_smoke
7/953 Test #7: blt_cuda_openmp_smoke ................................................. Passed 0.24 sec
Start 8: blt_cuda_mpi_smoke
8/953 Test #8: blt_cuda_mpi_smoke .................................................... Passed 0.99 sec
Start 9: convdiff_test_test.2d.input
9/953 Test #9: convdiff_test_test.2d.input ...........................................***Failed 2.74 sec
CUDAassert: an illegal memory access was encountered /usr/include/RAJA/policy/cuda/synchronize.hpp 42
terminate called after throwing an instance of 'std::runtime_error'
what(): CUDAassert
[compute1-exec-204:23351] *** Process received signal ***
[compute1-exec-204:23351] Signal: Aborted (6)
[compute1-exec-204:23351] Signal code: (-6)
[compute1-exec-204:23351] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12980)[0x7f76ad17a980]
[compute1-exec-204:23351] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f76abe68fb7]
[compute1-exec-204:23351] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7f76abe6a921]
[compute1-exec-204:23351] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8c957)[0x7f76aca8c957]
[compute1-exec-204:23351] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92ae6)[0x7f76aca92ae6]
[compute1-exec-204:23351] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92b21)[0x7f76aca92b21]
[compute1-exec-204:23351] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92d54)[0x7f76aca92d54]
[compute1-exec-204:23351] [ 7] /scratch1/fs1/jmertens/SAMRAI/build/bin/convdiff(_Z19RAJA_ABORT_OR_THROWPKc+0x64)[0x55d43f912799]
[compute1-exec-204:23351] [ 8] /scratch1/fs1/jmertens/SAMRAI/build/bin/convdiff(_ZN4RAJA10cudaAssertE9cudaErrorPKcib+0x67)[0x55d43f91284e]
[compute1-exec-204:23351] [ 9] /scratch1/fs1/jmertens/SAMRAI/build/bin/convdiff(_ZN4RAJA11synchronizeINS_6policy4cuda16cuda_synchronizeEEEvv+0x34)[0x55d43ff913c9]
[compute1-exec-204:23351] [10] /scratch1/fs1/jmertens/SAMRAI/build/bin/convdiff(_ZN6SAMRAI4tbox11synchronizeINS0_6policy8parallelEEEvv+0x9)[0x55d43ff91307]
[compute1-exec-204:23351] [11] /scratch1/fs1/jmertens/SAMRAI/build/bin/convdiff(_ZN6SAMRAI4tbox20parallel_synchronizeEv+0x34)[0x55d43ff9133e]
[compute1-exec-204:23351] [12] /scratch1/fs1/jmertens/SAMRAI/build/bin/convdiff(_ZNK6SAMRAI4mesh17GriddingAlgorithm8fillTagsEiRKSt10shared_ptrINS_4hier10PatchLevelEEi+0x192)[0x55d43ffa2910]
[compute1-exec-204:23351] [13] /scratch1/fs1/jmertens/SAMRAI/build/bin/convdiff(_ZN6SAMRAI4mesh17GriddingAlgorithm17makeCoarsestLevelEd+0x131a)[0x55d43ff9757c]
[compute1-exec-204:23351] [14] /scratch1/fs1/jmertens/SAMRAI/build/bin/convdiff(main+0x1ca0)[0x55d43f95ec8b]
[compute1-exec-204:23351] [15] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f76abe4bbf7]
[compute1-exec-204:23351] [16] /scratch1/fs1/jmertens/SAMRAI/build/bin/convdiff(_start+0x2a)[0x55d43f903dea]
[compute1-exec-204:23351] *** End of error message ***
It seems this is caused by the code block
#if defined(HAVE_RAJA)
tbox::parallel_synchronize();
#endif
in GriddingAlgorithm::fillTags() when initializing with makeCoarsestLevel(). The RAJA version is v0.13.0 and the version recommended v0.12.1 has the same issue. I also passed all RAJA test under my environment.