hiop icon indicating copy to clipboard operation
hiop copied to clipboard

Ginkgo+cuda fails on Ascent if build type is not Debug

Open maksud opened this issue 2 years ago • 1 comments

Ginkgo with CUDA fails when HiOp is not built with build type Debug. By default, the spack builds with RelWithDebInfo therefore the default spack installation fails with the following message:

bash-4.4$ jsrun -n1 -g1 ./src/Drivers/Sparse/NlpSparseEx2.exe 500 -ginkgo_cuda
[h49n01:1092778] *** Process received signal ***
[h49n01:1092778] Signal: Segmentation fault (11)
[h49n01:1092778] Signal code: Invalid permissions (2)
[h49n01:1092778] Failing at address: 0x200112e20400
[h49n01:1092778] [ 0] linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
[h49n01:1092778] [ 1] ./src/Drivers/Sparse/NlpSparseEx2.exe[0x101c236c]
[h49n01:1092778] [ 2] ./src/Drivers/Sparse/NlpSparseEx2.exe[0x101c370c]
[h49n01:1092778] [ 3] ./src/Drivers/Sparse/NlpSparseEx2.exe[0x10172e6c]
[h49n01:1092778] [ 4] ./src/Drivers/Sparse/NlpSparseEx2.exe[0x10176bc8]
[h49n01:1092778] [ 5] ./src/Drivers/Sparse/NlpSparseEx2.exe[0x1017530c]
[h49n01:1092778] [ 6] ./src/Drivers/Sparse/NlpSparseEx2.exe[0x1016bb64]
[h49n01:1092778] [ 7] ./src/Drivers/Sparse/NlpSparseEx2.exe[0x10048a50]
[h49n01:1092778] [ 8] /lib64/power9/libc.so.6(+0x24078)[0x20000ef54078]
[h49n01:1092778] [ 9] /lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x20000ef54264]
[h49n01:1092778] *** End of error message ***
ERROR:  One or more process (first noticed rank 0) terminated with signal 11 (core dumped)

However, setting the build type to Debug still doesn't solve the other Ginkgo issue on Summit, Marianas, etc., that is the following error:

Setting up Ginkgo solver ...
terminate called after throwing an instance of 'gko::CudaError'
  what():  /tmp/fda/spack-stage/spack-stage-ginkgo-glu_experimental-qn2vsrqqmaahic7saq25wbxxjypw4gj5/spack-src/cuda/base/executor.cpp:194: raw_copy_to: cudaErrorInvalidValue: invalid argument

maksud avatar Sep 02 '22 18:09 maksud

Tagging: @pelesh @nkoukpaizan @cnpetra @fritzgoebel @nychiang @CameronRutherford

maksud avatar Sep 02 '22 18:09 maksud

@maksud, please check if #548 fixes the issue you reported. Ginkgo tests pass on ascent.

pelesh avatar Sep 27 '22 20:09 pelesh

Fixed in #548 and #551.

pelesh avatar Nov 30 '22 23:11 pelesh