qmcpack icon indicating copy to clipboard operation
qmcpack copied to clipboard

(LLVM compiler bug) NV GPU Offload errors due to misaligned addresses

Open prckent opened this issue 1 year ago • 5 comments

Describe the bug

A whole variety of periodic Gaussian tests are failing with LLVM offload. The restart tests are also failing.

These are in the nightlies and offloading to V100.

See : https://cdash.qmcpack.org/viewTest.php?onlyfailed&buildid=7342

e.g. deterministic-diamondC_2x1x1_pp-vmcbatch_gaussian_sdj-1-1 https://cdash.qmcpack.org/tests/2182646 "PluginInterface" error: Failure to synchronize stream (nil): Error in cuStreamSynchronize: misaligned address omptarget error: Consult https://openmp.llvm.org/design/Runtimes.html for debugging options. SoaAtomicBasisSet.h:875:7: omptarget fatal error 1: failure of target construct while offloading is mandatory [sulfur:1226856] *** Process received signal ***

removed redundant ~~deterministic-restart-1-16 https://cdash.qmcpack.org/tests/2181794 Anonymous Buffer size per walker : 19280 Bytes. MEMORY increase 0 MB VMC::resetRun "PluginInterface" error: Faliure to copy data from device to host. Pointers: host = 0x00007f0df17bf3e4, device = 0x00007f0df20a9c00, size = 8: Error in cuMemcpyDtoHAsync: misaligned address omptarget error: Copying data from device failed. omptarget error: Call to targetDataEnd failed, abort target. omptarget error: Failed to process data after launching the kernel. omptarget error: Consult https://openmp.llvm.org/design/Runtimes.html for debugging options. "PluginInterface" error: ompBLAS.cpp:649:3: omptarget fatal error 1: failure of target construct while offloading is mandatory Failure to synchronize stream (nil): Error in cuStreamSynchronize: misaligned address~~

To Reproduce

Ask for latest software versions if not clear on cdash

Expected behavior Tests should pass

System: sulfur

prckent avatar Aug 21 '24 18:08 prckent

Using LLVM 18.1.8

prckent avatar Aug 21 '24 18:08 prckent

With clang, -DCMAKE_BUILD_TYPE=Debug doesn't add optimization flags -Ox namely using the default -O0. I can reproduce the issue and after adding -O3 using -DCMAKE_BUILD_TYPE=Debug -DCMAKE_CXX_FLAGS=-O3, the error disappears. So it is a compiler issue not QMCPACK source code issue.

ye-luo avatar Aug 21 '24 20:08 ye-luo

Any chance for a small reproducer? Can you make an issue on the relevant repo and link it here?

prckent avatar Aug 21 '24 20:08 prckent

Any chance for a small reproducer? Can you make an issue on the relevant repo and link it here?

Unfortunately, it will be very very low priority for me.

ye-luo avatar Aug 21 '24 21:08 ye-luo

No worries.

prckent avatar Aug 22 '24 00:08 prckent