Issues with Cuda 9.0
Hi there,
I am having a very strange issue running the CUDA variant of LULESH (release of 2.0.2).
I'm compiling using Cuda compilation tools, release 9.0, V9.0.176 and setting either the flag -arch=sm_35 or, to avoid compilation warnings, the flag -arch=sm_70.
When running the code on a Tesla V100-SXM2-32GB, the program crash as follows:
$ ./lulesh -s 10
Host compute1-exec-206.ris.wustl.edu using GPU 0: Tesla V100-SXM2-32GB
terminate called after throwing an instance of 'thrust::system::system_error'
what(): parallel_for failed: invalid argument
[compute1-exec-206:00204] *** Process received signal ***
[compute1-exec-206:00204] Signal: Aborted (6)
[compute1-exec-206:00204] Signal code: (-6)
[compute1-exec-206:00204] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f7e38bda390]
[compute1-exec-206:00204] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)[0x7f7e37d8f428]
[compute1-exec-206:00204] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7f7e37d9102a]
[compute1-exec-206:00204] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x16d)[0x7f7e386d284d]
[compute1-exec-206:00204] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8d6b6)[0x7f7e386d06b6]
[compute1-exec-206:00204] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8d701)[0x7f7e386d0701]
[compute1-exec-206:00204] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8d919)[0x7f7e386d0919]
[compute1-exec-206:00204] [ 7] ./lulesh[0x41f252]
[compute1-exec-206:00204] [ 8] ./lulesh[0x417330]
[compute1-exec-206:00204] [ 9] ./lulesh[0x41ade5]
[compute1-exec-206:00204] [10] ./lulesh[0x405cff]
[compute1-exec-206:00204] [11] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f7e37d7a830]
[compute1-exec-206:00204] [12] ./lulesh[0x409bf9]
[compute1-exec-206:00204] *** End of error message ***
Aborted (core dumped)
As anyone else observed or reported something similar? What version of CUDA do you usually use to compile LULESH?
Thank you in advance,
Umberto
@uvilla I may be a little bit late here, but I observed exactly the same problem on exactly the same hardware (V100-SXM2).
It looks like this code is not maintained for a very long time.
Good news are that disabling MPI support in Makefile helps.
Yes the CUDA version is not maintained. It was a Nvidia port. and Nvidia has not been updating it. The mainline code is maintained.