LULESH Issues with Cuda 9.0

Hi there,

I am having a very strange issue running the CUDA variant of LULESH (release of 2.0.2).

I'm compiling using Cuda compilation tools, release 9.0, V9.0.176 and setting either the flag -arch=sm_35 or, to avoid compilation warnings, the flag -arch=sm_70.

When running the code on a Tesla V100-SXM2-32GB, the program crash as follows:

$ ./lulesh -s 10
Host compute1-exec-206.ris.wustl.edu using GPU 0: Tesla V100-SXM2-32GB
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  parallel_for failed: invalid argument
[compute1-exec-206:00204] *** Process received signal ***
[compute1-exec-206:00204] Signal: Aborted (6)
[compute1-exec-206:00204] Signal code:  (-6)
[compute1-exec-206:00204] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f7e38bda390]
[compute1-exec-206:00204] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)[0x7f7e37d8f428]
[compute1-exec-206:00204] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7f7e37d9102a]
[compute1-exec-206:00204] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x16d)[0x7f7e386d284d]
[compute1-exec-206:00204] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8d6b6)[0x7f7e386d06b6]
[compute1-exec-206:00204] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8d701)[0x7f7e386d0701]
[compute1-exec-206:00204] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8d919)[0x7f7e386d0919]
[compute1-exec-206:00204] [ 7] ./lulesh[0x41f252]
[compute1-exec-206:00204] [ 8] ./lulesh[0x417330]
[compute1-exec-206:00204] [ 9] ./lulesh[0x41ade5]
[compute1-exec-206:00204] [10] ./lulesh[0x405cff]
[compute1-exec-206:00204] [11] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f7e37d7a830]
[compute1-exec-206:00204] [12] ./lulesh[0x409bf9]
[compute1-exec-206:00204] *** End of error message ***
Aborted (core dumped)

As anyone else observed or reported something similar? What version of CUDA do you usually use to compile LULESH?

Thank you in advance,

Umberto

Jan 22 '20 00:01 uvilla

@uvilla I may be a little bit late here, but I observed exactly the same problem on exactly the same hardware (V100-SXM2).

It looks like this code is not maintained for a very long time.

Good news are that disabling MPI support in Makefile helps.

May 13 '20 23:05 miharulidze

Yes the CUDA version is not maintained. It was a Nvidia port. and Nvidia has not been updating it. The mainline code is maintained.

May 14 '20 00:05 ikarlin