amrex
amrex copied to clipboard
Segfault while running the hypre tutorial using CUDA GPU
Hello,
We are facing a segfault while running the amrex-tutorials/ExampleCodes/LinearSolvers/ABecLaplacian_C example with the input file as inputs.hypre. We use the latest amrex, amrex-tutorials, and hypre. Our CUDA version is 11.2.
Here is the GNUMake we used for compiling the codes. Basically, we use the SINGLE PRECISION and compile the codes enabling CUDA and HYPRE.
DEBUG = TRUE
USE_MPI = FALSE
USE_OMP = FALSE
USE_HYPRE = TRUE
USE_PETSC = FALSE
USE_CUDA = TRUE
COMP = gnu
DIM = 3
PRECISION = FLOAT
AMREX_HOME ?= ../../../../amrex
include $(AMREX_HOME)/Tools/GNUMake/Make.defs
include ./Make.package
Pdirs := Base Boundary LinearSolvers/MLMG
Ppack += $(foreach dir, $(Pdirs), $(AMREX_HOME)/Src/$(dir)/Make.package)
include $(Ppack)
include $(AMREX_HOME)/Tools/GNUMake/Make.rules
Here are the outputs in the Backtrace.0.
=== If no file names and line numbers are shown below, one can run
addr2line -Cpfie my_exefile my_line_address
to convert `my_line_address` (e.g., 0x4a6b) into file name and line number.
Or one can use amrex/Tools/Backtrace/parse_bt.py.
=== Please note that the line number reported by addr2line may not be accurate.
One can use
readelf -wl my_exefile | grep my_line_address'
to find out the offset for that line.
0: ./main3d.gnu.FLOAT.DEBUG.CUDA.ex(+0x425a68) [0x56141cae5a68]
amrex::BLBackTrace::print_backtrace_info(_IO_FILE*) at /home/zengx372/cfdx/submods/amrex/Src/Base/AMReX_BLBackTrace.cpp:175
1: ./main3d.gnu.FLOAT.DEBUG.CUDA.ex(+0x42554d) [0x56141cae554d]
amrex::BLBackTrace::handler(int) at /home/zengx372/cfdx/submods/amrex/Src/Base/AMReX_BLBackTrace.cpp:85
2: /lib/x86_64-linux-gnu/libc.so.6(+0x43090) [0x7f1769955090]
3: ./main3d.gnu.FLOAT.DEBUG.CUDA.ex(+0xb2e6e8) [0x56141d1ee6e8]
?? ??:0
4: ./main3d.gnu.FLOAT.DEBUG.CUDA.ex(+0xb30aae) [0x56141d1f0aae]
?? ??:0
5: ./main3d.gnu.FLOAT.DEBUG.CUDA.ex(+0xb1ea77) [0x56141d1dea77]
?? ??:0
6: ./main3d.gnu.FLOAT.DEBUG.CUDA.ex(+0x63338) [0x56141c723338]
amrex::HypreABecLap2::prepareSolver() at /home/zengx372/cfdx/submods/amrex/Src/Extern/HYPRE/AMReX_HypreABecLap2.cpp:220
7: ./main3d.gnu.FLOAT.DEBUG.CUDA.ex(+0x5d0a5) [0x56141c71d0a5]
amrex::HypreABecLap2::solve(amrex::MultiFab&, amrex::MultiFab const&, float, float, int, amrex::BndryData const&, int) at /home/zengx372/cfdx/submods/amrex/Src/Extern/HYPRE/AMReX_HypreABecLap2.cpp:46
8: ./main3d.gnu.FLOAT.DEBUG.CUDA.ex(+0x4be050) [0x56141cb7e050]
amrex::MLMG::bottomSolveWithHypre(amrex::MultiFab&, amrex::MultiFab const&) at /home/zengx372/cfdx/submods/amrex/Src/LinearSolvers/MLMG/AMReX_MLMG.cpp:1890 (discriminator 4)
9: ./main3d.gnu.FLOAT.DEBUG.CUDA.ex(+0x4b7647) [0x56141cb77647]
amrex::MLMG::actualBottomSolve() at /home/zengx372/cfdx/submods/amrex/Src/LinearSolvers/MLMG/AMReX_MLMG.cpp:957
10: ./main3d.gnu.FLOAT.DEBUG.CUDA.ex(+0x4b6ed5) [0x56141cb76ed5]
amrex::MLMG::bottomSolve() at /home/zengx372/cfdx/submods/amrex/Src/LinearSolvers/MLMG/AMReX_MLMG.cpp:888
11: ./main3d.gnu.FLOAT.DEBUG.CUDA.ex(+0x4a6123) [0x56141cb66123]
amrex::MLMG::mgVcycle(int, int) at /home/zengx372/cfdx/submods/amrex/Src/LinearSolvers/MLMG/AMReX_MLMG.cpp:454
12: ./main3d.gnu.FLOAT.DEBUG.CUDA.ex(+0x4a4eff) [0x56141cb64eff]
amrex::MLMG::oneIter(int) at /home/zengx372/cfdx/submods/amrex/Src/LinearSolvers/MLMG/AMReX_MLMG.cpp:266
13: ./main3d.gnu.FLOAT.DEBUG.CUDA.ex(+0x4a3f3f) [0x56141cb63f3f]
amrex::MLMG::solve(amrex::Vector<amrex::MultiFab*, std::allocator<amrex::MultiFab*> > const&, amrex::Vector<amrex::MultiFab const*, std::allocator<amrex::MultiFab const*> > const&, float, float, char const*) at /home/zengx372/cfdx/submods/amrex/Src/LinearSolvers/MLMG/AMReX_MLMG.cpp:130
14: ./main3d.gnu.FLOAT.DEBUG.CUDA.ex(+0xf652e) [0x56141c7b652e]
MyTest::solveABecLaplacian() at /home/zengx372/cfdx/submods/amrex-tutorials/ExampleCodes/LinearSolvers/ABecLaplacian_C_w_hypre/MyTest.cpp:255 (discriminator 6)
15: ./main3d.gnu.FLOAT.DEBUG.CUDA.ex(+0xf45fa) [0x56141c7b45fa]
MyTest::solve() at /home/zengx372/cfdx/submods/amrex-tutorials/ExampleCodes/LinearSolvers/ABecLaplacian_C_w_hypre/MyTest.cpp:28
16: ./main3d.gnu.FLOAT.DEBUG.CUDA.ex(+0xf3019) [0x56141c7b3019]
main at /home/zengx372/cfdx/submods/amrex-tutorials/ExampleCodes/LinearSolvers/ABecLaplacian_C_w_hypre/main.cpp:13
17: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f1769936083]
18: ./main3d.gnu.FLOAT.DEBUG.CUDA.ex(+0x4049e) [0x56141c70049e]
?? ??:0
Here is the way to reproduce the above bug.
- Make sure you installed the CUDA correctly
- Download the latest hypre from https://github.com/hypre-space/hypre.git to your home directory
- Use CMake to configure, build and install the hypre.
cd ~/hypre/src/cmbuild
cmake -DHYPRE_WITH_MPI=OFF -DHYPRE_WITH_OPENMP=OFF -HYPRE_BUILD_EXAMPLES=OFF -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON -DCMAKE_CXX_STANDARD=17 -DHYPRE_ENABLE_SINGLE=ON -DHYPRE_WITH_CUDA=ON –DCUDA_HOME=”/usr/local/cuda” -DHYPRE_CUDA_SM=60 ..
cmake --build . --config release -j
cmake --install . --prefix ~/hypre/src/hypre
- Building with HYPRE via the above GNUMake: https://amrex-codes.github.io/amrex/tutorials_html/Hypre_Install.html and run with inputs.hypre